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1. General Description 

LAS is developing a suite of Name Search tools (i.e., APIs) that can be integrated within an existing 
customer application or can be used to provide the "guts" of a new customer application. The LAS 
Name Search Suite of Tools shall : . 

• be composed of one or more C++ APIs 

• be compatible with any modem platform with a C++ compiler 

• provide mechanisms to : 

• compare a query name with one or more candidate names to produce an ordered 
list of candidate names with the highest probability of representing the same 
"named" person. This functionality is referred to as the Name Comparison 
TooKs) in the remainder of this document. 

• generate and store intelligent search data for use in extracting relevant subsets of 
data from large data bases for further evaluation. These mechanisms will facilitate 
more efficient name searching while ensuring complete and accurate results. This 
functionality is referred to as the Name Extraction TooKs) in the remainder of this 
document. 

The Initial offering of the APIs will provide developers with the capability to: 

• compare two names to determine the probability that they both represent the same named 
Individual; or 

• compare a single query name with a set of candidate names to determine which candidate 
names are most likely to represent the same named individual. 

When a set of candidate names is evaluated, the APIs enable the developer to define the criteria for 
producing his/her own Results Set. The available options for defining a Result Set include the- 
following: 

• an unordered list of all candidate names whose name score exceeds a pre-defined name 
threshold (e.g., if the threshold = 0, all candidate names will be returned in an unordered 

list); 

• an ordered list of all candidate names whose name score exceeds a pre-defined name 
threshold (e.g.. if the threshold = 0, all candidate names will be returned in an ordered list); 
or 

• an ordered list of the top X candidate names whose name score exceeds a pre-defined 
name threshold, where X is a number. 

1,1 LAS Name Comparison tools 
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The LAS Name Comparison tools include: 

• NameCheck - This tool employs multiple evaluation techniques to evaluate and score two 
names. The NameCheck-tool incorporates information regarding variations in spelling, 
discrepancy in the number of name segments (amount of information included), exclusion 
of expected information, and positional information in order to establish a name score, 
which indicates the probability that the two names represent the same individual. The 
NameCheck tool is controlled by a set of configurable parameters. The NameCheck tool 
also manages and produces an ordered or unordered list of candidate names with the 
highest probability of representing the same "named" person, based on the developer- 

.-,v,.^efined criteria for establishing a set of results. 

• Various culture-specific tools are available as extensions to the NameCheck tool to 
perform such functions as the cultural classification of name data (NameClassifler). 
leveling of variations in name data to a single representation (islameRegularizer). and the 
representation of name data based on phonetic similarity (PhoneticNameKey). 

Version 1.0 of the LAS Name Comparison Tool(s) will establish a baseline of the minimum 
functionality necessary to perform fuzzy matching on name data. There are two additional enhanced 
versions of the tool expected to be implemented in-house, prior to producing version 1 .0 of a 
commercially available product. This document defines the functionality to be incorporated into 
Version 1.0 of the tool, and in some cases, describes why certain decisions were made regarding 
specific functionality. The document also notes areas for planned future enhancement. 

1.2 LAS Name Extraction toois 
The LAS Name Extraction tools Include: 

• An Intelligent Search Data Generator (ISDG) which generates one or more search data 
values that facilitate extraction of relevant information from a data base for further 
comparative analysis. This tool is a critical component of any search system that must 
search large volumes of data to locate similar name data. It Is not feasible to retrieve and 
evaluate every name record in a data base to determine its relevance to a query name. 
The ISDG provides a motivated method for retrieving all relevant information from a data 
base while reducing the amount of non-relevant information retrieved. ' This tool can 
provide significant performance improvements while also ensuring an accurate and 
complete name search. 

• Various culture-specific tools are available as extensions to the ISDG to perform such 
functions as the cultural classification of name data (NameClassifler), leveling of 
variations in name data to a single representation (NameRegularizer), and the 
representation of name data based on phonetic similarity (PhoneticNameKey). 

Note that in the current versi on of this document there is no further discussion of the 
Name Extraction Toolfs l These toois wili be deveiooed in the future 
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2. Perform Error Handling 
2.1 Functionality 

The tool shall establish a standard list of error codes and their associated text descriptions. 

Each function call shall return an error code whenever error checking is appropriate. 

-The tool shall also provide the capability for the developer to retrieve the text associated with the error 
code. 

The following is a list of the error codes and their meaning: 





Meaning'" ■ 


p , ' , ■', '-Error Code : 





3. Produce Linguistic Trace 
3.1 Functionality 

Version 1.0 will not provide any Linguistic Trace functionality. 

4. Accept Input Name Data 
4.1 Input Parameters 

4.1.1 Functionality 

The tool shall verify that all input parameters have valid values as defined in the table below. 

The tool shall support several query types (i.e.. pre-defined sets of parameters) to facilitate searching 
the data based on different cultural or other linguistic perspectives. 

Certain combinations of these parameters provide better results when addressing known 
combinations of cultural and/or other linguistic issues. 

The tool shall provide the developer with the capability of selecting (defining) one of the API-defined 
query types. 

The tool shall also provide the developer with the capability of modifying any or all of the selected 
query set parameter values. 
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At a minimum the tool shall support the following query types : 

• Generic; 

• Anglo; 

• Arabic; 

• Chinese; 

• Hispanic; 

• Korean; and 

• ' Russian. 

The tool shall not allow parameters to be changed in the middle of processing a query, (see design 
notes below). 

The set of parameters included in a query type shall include: 
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4.1.2 Design Notes 

1 . Should the PARMS object be shared or copied to each query-name and evaluation-name 
object? The following diagram illustrates how all PARMS data would have to be copied and 
earned with every query object if we are to allow PARMS to be changed in the middle of 
processing without affecting the queries that are already in progress. This could result in 
significant overhead (i.e.. memory processing). If the PARMS object is shared then changes 
to any PARMS in the middle of processing could potentially affect processing that Is already ii 
progress. An example of why we would want to change the PARMS in the middle of 
processing Is : We want to re-compare the same query name using the same candidate 
names but with a different set of parameters. If we do not allow the PARMS to change then 
the developer would need to re-call the tool and have the tool re-process the query narne and 
the candidate names in order to compare the names with different parameter settings. The 
lists of TAQs and Given Name Variants are not considered update-able in the middle of 
processing. The PARMS that are considered update-able are those PARMS that the 
developer sets when the tool Is called to perform a comparison. 

Query 2 



QueryParms 






4.1.3 Future Version Notes 



1 . The tool shall provide the developer with the capability to modify existing or establish new 
query sets of pre-defined parameters. (Paranreter Definition Application). 
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2. 

4.2 Input Name Model 

4.2.1 Functionality 

The tool shall provide separate function calls for the following name models : 

• Given Name + Middle Name + Surname (GMS) 

• Given Name + Surname (GS) 

• Name (N) 

• Name 

• Surname. Given Name (SN, GN) 

jhe developer will call the desired function and provide the relevant string values in the appropriate 
function parameters. 

The tool shall accept empty string parameter values to the name model function calls. This 
functionality will be provided to support a customer data base that allows null values or empty strings 
in any of the fields (e.g., middle name) defined in their name model. 

Because the tool itself utilizes the GS model, the most efficient and accurate results will be provided if 
the GS model Is received as Input. 

4.2.2 Design Notes 

1 , We selected a function call approach as opposed to passing in a single name string with 
delimiters as the function call approach will: 

• be easier for a developer to determine which call is appropriate for the business need; 

• not require the developer to identify or understand irrelevant parameters; 

• not require the developer to incorporate irrelevant parameters into their application 
code; and 

• • be more efflcient. 

4.2.3 Future Version Notes 

1 . The tool may support the following additional name models/functions : 

• Given Name + Middle Name + Surname + Maiden Name (GMSM) 

• Given Name + Surname + Maiden Name (GSM) 

2. The tool may also utilize other name models besides the GS model, If deemed beneficial. 
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5. Preprocess Name Data 

5.1 Functionality 

The tool shall preprocess Input name data using the following techniques; in the order listed below: 

'"'"^^ • Identify and parse input name data into given name and surname (name fields) 

• in future, may use IHigh Frequency Surname data to define surname field 
in future, may move titles and qualifiers into given name field 

• in future, unknown and non-existent name values may be used to define given 
name and surname fields 

• Validate input name data 

• Convert name data to UPPER case 

• in future, this step may move after the TAQ processing (after conjoined TAQ 
processing is implemented) 

• Preprocess Segmentation and Removal markers (Noise data) 

• Parse name fields into name segments 

• Identify and process unknown and non-existent name values (e.g.. "FNU", "LNU") 

• Identify and process minor name parts (e.g.. Titles, Affixes, Qualifiers) 

• in future, identify gender, if applicable 

• in future, may identify and process morphological endings separate from TAQs 

• Identify number of segments in name fields 

• Identity and process Given Name Variants (Query Only) 

• in future, identify gender, if applicable 

• Identify and process Surname Variants (Query Only) 

5.1.1 Identify and parse input name data into given name and surname (name fields) 

5.1.1.1 Functionality 

If name data are received in a name model other than GS, then the name data shall be parsed into a 
GS model. 

If the GMS name model is provided, then the internal given name field shail be constructed by placing 
the input given name in the same field with the input middle name, and the internal surname field 
shall be set equal to the input surname field. 

If the name mode! does not distinguish the data beyond a single name field (N model), then the tool 
shall accept the last (i.e., right-most) name segment in the name field as the surname, and place ail 
other name segments in the given name field (e.g.. Name: Jose Garcia Gomez -> Given Name: Jose 
Garcia Surname: Gomez). The tool shall recognize the first comma in the Name field (N model) to 
represent a SN, GN model. The tool shall move the data to the left of the comma into the SN field, 
and the data to the right of the comma into the GN field (the comma shall be removed from further 
processing). 
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The tool shall retain the original form of the parsed Given Name field and Surname field for - 
subsequent processing (i.e.. Determine GivenNameCompressedScore, Determine 
SurnameCompressedScore. and Provide Results information to return to user). 

5.1 .1 .2 Future Version Notes 

- 1 . The tool may move all Titles, and Qualifiers Into the Given Name field. 

2. The^tool may utilize High Frequency (HF) Surname and TAQ data to determine the structure 
of the name data (e.g., two HF Hispanic Surnames found in a Name string can be used to 
identify a Hispanic Surname; ABU means "father of and BIN means "son of in Arabic names. 
Reference the example above. Name: "Jose Garcia Gomez" -> Given Name: "Jose" Surname: 
"Garcia Gomez"). 

3. With Arabic names, the tool may move all name segments other than the first name segment 
(presumably the first given name) into the SN field - clearly this functionality can not be 
implemented until the NameClassifier tool is made available. Other criteria may be used such 
as TAQ values. 

4. If HF Surnames are found anywhere In the GN field, the tool may move them to the SN field. 
This may prove beneficial to handling multi-segment surnames such as those that occur in the 
Hispanic naming system. Sometimes HF Surnames appear in the GN field because they are 
aliases, so we must be careful with this. 

5. The tool may utilize "NFN". "NMN". and "NLN" values when creating the name fields. The only 
contents of the SN field should be "NLN" or "LNU" if they occur anywhere in the name. If 
"NFN", "FNU". "MNU". or "NMN" occur, then they should occur only in the GN field. However, 
additional values may be allowed in the GN field. If name models other than the GS model 
are utilized within the tool itself, the tool may support more sophisticated processing of these 
values (e.g., the GMS) 

6. Since we decided that we would not handle Maiden name data at this time, there Is no special 
handling of middle name data at this time. Future versions of the tool may use gender data, if 
available, to manipulate the middle name when dealing with female -data - only when dealing 
with Anglo names, however. 

5.1.2 Validate input name data 

5.1.2.1 Functionality 

The tool shall assume that all name data are person names. 

The tool shall accept name data (given name plus middle name plus surname) up to 255 character 
length total. Thus, the tool shall support Given Name <= 255 characters, MN <= 255 characters, and 
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Surname <= 255 characters, since any one of these fields may contain all of the name data. The tool 
shall truncate any data that exceeds the specified maximum character length. 

The tool shall accept any case as input (i.e., upper, lower, or mixed case). 

The tool expects reman characters (i.e., alphabetic, numeric, and punctuation markers) as input, 
however it shall accept both roman and non-roman characters and process them in the same 
manner. 

' -Through Version 1.0. the tool shall not support double-byte character sets. 
5.1.2.2 Future Version Notes 

1 . The tool may recognize and support double-byte character sets (e.g.. Unicode). 
5.1.3 Convert name data to UPPER case 

5.1.3.1 Functionality 

The tool shall change all name data to upper case. 

5.1.3.2 Design Notes 

5.1 .3.3 Future Version Notes 

1 . The tool may reference the TAQ Table to process the name data more intelligently prior to 
converting the data from mixed case to upper case.. TAQ data may facilitate more intelligent 
segmentation of the name data (e.g., "VanDerMinten" -> "Van Der Minten" of-DAngeio" -> "D 
Angelo"). The tool may look up the values "Van", "Der", and "Minten" in the TAQ Table; if 
found, then the tool shall identify the values as TAQs and process them appropriately. If not 
found, the tool shall rejoin any remaining non-TAQs and then convert them to upper case. 

2. The tool may also reference a High Frequency Surname Table to process the name data 
more intelligently prior to converting the data from mixed case to upper case. High Frequency 
Surname data may facilitate more intelligent segmentation of the name data (e.g., 
"GarciaGomez" "Garcia Gomez"). The tool may look up the Surnames "Garcia" and 
"Gomez" In the High Frequency Surname Table; If found, then the tool shall identify the 
Surnames as High Frequency Surnames and process them appropriately. If not found, the 
tool shall rejoin any remaining non-High Frequency Surnames and then convert them to upper 
case. 

3. The tool may utilize case information to process the name data more intelligently in 
conjunction with TAQ and HF Sumame processing (e.g., DeLaCruz De La Cmz or 
GarctaGomez Garcia Gomez) to assist in determining and parsing name segments. 
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5.1.4 Prejsrocess Segmentation and Removal markers (Noise data) 
5.1.4.1 Functionality 

The tool shall recognize non-alphabetic characters received in the name model. 

The tool shall reference a list of replaceable single character non-alphabetic data values to determine 
-'What process if any is required for the character encountered. 

The tool shall replace each item identified in the list of single character values (i.e.. punctuation 
and/or numbers) with their designated single character REPLACEMENT VALUE as identified in the 
replacement list. 

The tool shall only allow a marker to be defined as either a removal marker or a segmentation 
marker. If a marker is defined as both a removal marker and a segmentation marker, then the tool 
shall recognize the marker as a removal marker. 

The Table below illustrates the default contents of the replacement list. 

Note that the' REPLACEMENT-VALUE in the table below is a textual description of an empty string 
(designated by NIL) or a space (designated by BLANK), which is provided for ease of reading, and is 
not necessarily representative of the physical representation of the list referenced by the tool. 

The tool shall recognize a list of single character markers that indicate the end of a name segment / 
beginning of a new name segment by replacing the segmentation markers with a space (designated 
by BLANK in the table below). 

The tool shall recognize standard segmentation delimiters such as tab. new line, carriage return, etc. 
without them being explicitly entered into the segmentation list. 

The tool shall recognize a list of markers that are designated for removal by deleting the values 
entirely from the name field (i.e., mapping each removal value to an empty value or no value; 
designated by NIL in the table below). 

The tool shall provide default lists of removal markers and segmentation markers. 

The tool shall allow the developer to provide a custom removal list. 

The tool shall also allow the developer to provide a custom segmentation list. 

The tool shall accept an empty removal list to indicate turning off removal processing. If a BLANK is 
included in the removal list, multiple segment name fields shall be recognized by the tool as a single 
segment. 
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as a single segment. 

„ the developer does not provide a segmer,tation list at all. the tool shall utilize Its owr, default 
segmentation list. 

Zjf the developer does not provide a removal list at all. the tool shall utilize its own default removal list. 







BLANK E 


ILANK (segmentation value) 


1 


JIL (removal value) 


i 


4iL (removal value) 


# 1 


vJiL (removal value) 


$ 1 


sIlL (removal value) 


% t 


NJIL (removal value) 




SJIL (removal value) 




NIL (removal value) 


• 


NIL (removal value) 


+ 


NIL (removal value) 




BLANK (seqmentation value) 




NIL (removal value) 


/ 


NIL (removal value) 




NIL (removal value) 




NIL (removal value) 


J • — 

< 


NIL (removal value) 




NIL (removal value) 


> 


NIL (removal value) 


7 


Nil (removal value) 


^ Nil (removal value) 




NIL (removal value) 




NIL (removal value) 




NIL (removal value) 




NIL (removal value) 




NIL (removal value) 




BLANK (seqmentation value) 




NIL (removal value) 




NIL (removal value) 


0 


NIL (removal value) 


1 


NIL (removal value) 


2 


NIL (removal value) 


3 


NIL (removal value) 


4 


NIL (removal value) 


5 


NIL (removal value) 


6 


NIL (removal value) 


7 


NIL (removal value) ' 
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8 


NIL fremoval value) 


9 


NIL (removal value) 



5.1.4.2 Design Notes 

1. We decided not to implement a string translation Table or regular expressions at this time 
for three primary reasons: 

• We determined that it wouid be easier to Implement single character replacements 
at this time. 

* =. We were not able to determine that there really is a need at this point to handle 
more than single character replacements, as rewrite rules and other techniques 
such as the TAG processing (separate-if-conjoined. disregard, and delete) satisfied 
all of the examples we could come up with for string replacements. The example 
of differences In handling apostrophes was resolved by TAQ processing, 

• Although regular expressions may provide the most flexible solution, our current 
regular expression routines are restricted to the windows environment and we will 
need more time to investigate alternative approaches prior to their implehientation 
in the tool. 

2. Even so, in the future, the tool may pre-process punctuation through a string replacement 
Table or through regular expressions to enable us to replace contextualized punctuation 
with some string value, if necessary. These preprocessing rules will probably be culture- 
specific, and therefore, will also require that the tool support culture-specific processing. 

3. Special non-roman characters that often appear intermixed with roman characters, (e.g... 
□. □, □, □ ). will probably be handled with rewrite rules via regular expressions by other 
functions yet to be defined for the tool. 

5.1.4.3 Future Version Notes - 

1 . We may want to look further at " " and { ) because these values are sometimes used to 
designate an alias or nickname when they appear in either the SN field or the GN field 
(e.g., PITRA "PETROFF". SANTANA ANDRE or EMAN. JAN (HENNY) H.).' 

5.1.5 Parse name fields into name segments 
5.1.5.1 Functionality 

Name fields shall be parsed into name segments with remaining punctuation in tact (i.e.. any 
punctuation not processed in 2.4 shall be left in tact in the name data). 

The tool shall define a name segment as a string of text surrounded by white space. 
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The tool shall support up to 10 segments in both the SN and GN fields prior to removal of TAQs. 

If more than 10 segments are provided In Version 1 .0. any additional segments will simply be 
excluded from the evaluation process. 

The tool shall support name segments up to 30 characters in length. 
. 5.1.5.2 Future Version Notes 

1. Tbe.tool may segment two HF names If found conjoined. 
5.1.6 Identify and process unknown and non-existent name values 

5.1.6.1 Functionality 

The tool shall recognize the following special characters which indicate unknown or non-existent . 
name segment values: 

• "FNU" - representing first name unknown; 

• "MNU" - representing middle name unknown; 

• "LNU" - representing last name unknown; 

• "NFN" - representing no first name; 

• "NMN" - representing no middle name; and 

• "NLN" - representing no last name. 

The tool shall replace any GN segment containing TNU". "MNU". "NFN". and "NMN" with an empty 
GN segment and tag the segment as "unknown* or "not-exist". 

The tool shall replace any SN segment containing "LNU" and "NLN" with an empty SN segment and 
tag the segment as "unknown" or "not-exIst". 

The tool shall tag all other SN or GN segments as "known". 

5.1.6.2 Desion Notes 

1 . Tagging these special values will enable the tool to use this information during the 
evaluation process. 

2. The tool assumes that the name field processing has already addressed the issue that 
"LNU" and "NLN" should not appear In the GN field and that "FNU", "MNU", "NFN", and 
"NMN" should not appear in the SN field. Thus, there Is no special handling done at this 
point In the process. 
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5.1.7 Identify and process minor name parts (e.g.. Titles. Affixes. Qualifiers) 

5.1 .7.1 Functionality 

The tool shall identify Titles, Affixes (prefixes, suffixes, infixes), and Qualifiers (TAQs) in the name 
data by referencing a Table of single segment TAQs (I.e.. no complex TAQs. such as a/ d/n) that are 
defined within the context of a specified cultural group or partition. Note that the TAQ Table does not 
, include punctuation in Version 1.0. Future versions may include punctuation such as O*. 

5.1.7.1.1 T^Q Table 

The TAQ Table shall contain Titles. Affixes, and Qualifiers that are described in context of a specified 
cultural group or partition (i.e.. CULT-AFF-ID). In Version 1.0 of the tool, the TAQ table shall include 
a "Generic" partition (i.e., non-culture specific), as well as Anglo. Arabic, Chinese, Hispanic, Korean, 
and Russian. The table below lists the cultural partitions that will be included In Version 1.0: 







A 


Arabic 


C 


Chinese 


E 


Anqlo 


G 


Generic 


H 


Hispanic 


K 


Korean 


R 


Russian 



A "Generic" partition shall be composed of the most commonly occurring TAQ values that can be 
evaluated as a TAQ value regardless which culture is being evaluated. In other words, "Generic" 
TAQs frequently occur in multiple cultures. For example. "MR" and "PHD" often occur in a variety of 
multi-cultural names. Titles and Qualifiers readily fall into the "Generic" category. Affixes are less 
likely to occur In the "Generic" category. 

In most cases, TAQ values that are included in the "Generic" partition will not be Included in a cultural 
partition even though they may be associated with a specific culture. Exceptions will occur when the 
TAQ definition (TAQ-TYPE-CODE. GENDER. SEPARATE-IF-CONJOINED. SN-PROCESS-ID. or 
GN-PROCESS-ID) or TAQ processing is distinct in the different cultures. f=or example, "SR" is an ^ 
abbreviation of the Hispanic title "Senor' as well as a Generic qualifier indicating "senior or the first". 

The TAQ Table shall not be modifiable by the developer or user in Version 1.0. 

A separate data base utility will be developed to generate code representing the contents of the TAQ 
Table which is currently stored In MS Access. 



An example of the contents of the TAQ Table is provided below: 
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./^SEPARATEilE^ 




jSNiPROCES^ID^. 


iGN^ROCESSrlD! 


A 


AABD 


p 


0 


u 


DIS 


DIS 


A 


AAL 


p 


0 


u 


DIS 


DIS 


A 


ABA 


p 


0 


M 


DIS 


DIS 


A 


ABBD 


p 


0 


u 


DIS 


DIS 


G 


ABD 


p 


0 


u 


DIS 


DIS 


A 


ABDAL 


p 


0 


u 


DIS 


DIS 


A 


ABDAN ' 


p 


1 


u 


DIS 


DIS 


A 


ABDAR 


p 


1 


u 


DIS 


DIS 


A 


ABDAS 


p 


0 


u 


DIS 


DIS 


G 


ABDEL 


p 


0 


u 


DIS 


DIS 


A 


ABDEN 


p 


1 


u 


DIS 


DIS 


A 


ABDER 


p 


1 


u 


DIS 


DIS 


A 


ABDES 


p 


1 


u 


DIS 


DIS 



Each TAQ shall be classified as a P-Prefix. S-Suffix. T-Title, l-lnfix, Q-Qualifier (TAQ-TYPE-CODE). 

Note that at the present time, we have no Infixes in the Table. 



Each TAQ shall be classified as to whether or not the TAQ shall be recognized and separated from a 
stem. If the TAQ occurs conjoined with the stem (SEPARATE-IF-CONJOINED. 1="r. 0=-r). Version 
1 of the too! shall not process SEPARATE-IF-CON JOINED. 

Each TAQ shall be assigned a gender (GENDER) of either "M" - male, "F" - Female, or "U" - 
Undefined. Version 1 of the too! shall not process GENDER. 

Each TAQ shall be classified distinctly for the SN field (SN-PROCESS-ID) and GN field (GN- 
PROCESS-ID) as either a DELETE TAQ fDEL") or a DISREGARD TAQ ("DIS"). 

5.1.7.1.2 TAQ Processing 

If the parameter GIvenNameCheckTAQ = "off', the tool shall not perform any TAQ processing in the 
GN field. 

If the parameter SumameCheckTAQ = "off. the tool shall not perform any TAQ processing In the SN 
field. 

If the parameter GivenNameCheckTAQ = "remove" or "score", the tool shall perform TAQ processing 
in the GN field as described below. 

If the parameter SumameCheckTAQ = "remove" or "score", the tool shall perform TAQ processing in 
the SN field as described below. 

The tool shall consider TAQs to occur anywhere within the GN field or GN segment, if the 
GivenNameCheckTAQ = "remove" or "score". 
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The tool shall consider TAQs to occur anywhere within the SN field or SN segment, if the 
SurnameCheckTAQ = "remove" or "score". 

The tool shall recognize and remove all TAQs from the GN and SN name '^^-J^^lf^^^^J^^^^^^ 
value of the TAG as well as the classification of the TAQ as a DELETE TAQ or DISREGARD TAa 
Each TAQ's set of retained Information shall be associated with the TAQ's related name segment tor 
use in the evaluation process. 

" The tool shall recognize TAQs via the following steps: 

. First the too! shall look up each GN or SN segment to determine whether it is included In 
the TAQ Table for the appropriate cultural perspective (CULT-AFF-ID) as defined by the 
API-defined query type. 

• If the GN or SN segment Is included In the subset of the TAQ Table associated 
with the seleaed cultural perspective, the tool shall process the TAQ according to 
the culture-specific definition, as described below. 

. If the GN or SN segment is not included in the subset of TAQ Table associated 
with the selected cultural perspective, and the selected cultural perspective is not 
"Generic", then the tool shall look up each GN or SN segment to determine 
whether it is Included in the "Generic" subset of the TAQ Table. 

• If the GN or SN segment is included in the "Generic" subset of the TAQ 
Table, the tool shall process the TAQ according to the culture-specific 
definition, as described below. 

. if the GN or SN segment is not included in the "Generic" subset of the TAQ 
Table, the tool shall perform no additional TAQ processing for this GN or 
SN segment. 

The tool shall utilize the culture-specific definition of the TAQ (information in the TAQ Table about 
each TAQ) to determine Its related segment in the following manner: 

• Stems are defined as any segment whose value is not defined as a TAQ {i.e.. is not 
included in the TAQ Table); 

• Any TAQ located to the left of the first stem will be associated with the first stem; 

• Any TAQ located to the right of the final stem will be associated with the final stem; and 

• For medial TAQs, the following mies shall apply: 

• Find the rightmost suffix (as defined in the TAQ Table) following a stem; 
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• That suffix and any other TAQ preceding it shall be associated with the preceding 
stem: and 

• Any remaining TAQs~Stiall be associated with the next stem. 



The following example Illustrates how TAQs will be associated with stem segments. 



DOKTOR ABD EL RAHMAN NOOR 


EL 


DIN ABD 


EL 


KADIR 


T1 P1 P2 STEM1 STEM2 


P3 


S1 P4 


P5 


STEM3 


T1 P1 P2 are all associated with STEM1 










P3 S1 are associated with STEM2 










P4 P5 are associated with STEM3 











If every name segment contained In a name field is identified as a TAQ, then the tool shall associate 
all of the TAQs with a single empty segment. 



TAQs may occur conjoined with a name stem, as is the case with DeLa Cruz. O^Connor, and 
MacDougal, or they can occur as disjoined segments within a name, as in De La Cruz, O' Connor and 
Mac Dougal. Version 1.0 of the tool shall not recognize or process conjoined TAQs. 

5.1.7.2 Design Notes 

A selected subset of the current corporate TAQ Table will be designated for inclusion in 
the product TAQ Table. 

The tool will not recognize complex TAQs, such as "al din". Previous prototypes for the. 
State Department (legacy ANA) have supported complex TAQs for reasons that are not 
relevant to this tool. For more information on this Issue, refer to the corporate linguistic 
data repository TAQ documentation. 

The tool will accept empty strings in the SN and GN fields, so there is no special 
processing to handle the situation when only TAQ(s) occur in one of the name fields. 

5.1.7.3 Future Version Notes 

1 . The developer or user shall be provided mechanisms for adding new TAQs to the TAQ 
Table. 

2. The developer or user shall not be allowed to delete TAQs from the TAQ Table. 

3. The need for DELETE and DISREGARD tags for TAQs may be eliminated - DELETE and 
DISREGARD relations may both be replaced by a single matrix of relationships between 



1. 
2. 

3. 
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TAQs (as described in the Apply Surname TAQ Factors and Apply Given Name TAQ 
Factors sections of this document). 

4 If DELETE and DISREGARD tags are maintained distinctly, the tool may also support 
DELETE and DISREGARD processing of Infixes. If necessary, Currently there are no 
infixes defined in the TAQ Table. 

5. The tool may support conjoined TAQs. 

"^"^^^ Conjoined TAQs can be very effective in dealing with morphological endings... . 
(i.e.. conjoined suffixes such as -son, -man. -ovich). Conjoined suffixes may be 
supported prior to cultural specific handling. Morphological endings trigger left bias 
right now. so we will need to consider left bias when implementing morphological 
lookups either as part of TAQ processing, or independent of it. 

• Conjoined TAQs will be even more effective if the tool supports culture-specific 
processing. 

• Once culture-specific processing is supported, the tool shall recognize that alt TAQ 
types can be conjoined with a stem (i.e., prefix, suffix, infix, title, qualifier). 

• If a TAQ is Identified as conjoined (i.e., the field SEPARATE-! F-CON JOINED = "F . 
then the tool shall consider this TAQ if it is conjoined to a stem as well as if it is a 
stand-alone name segment (i.e., the TAQ is surrounded by white space); if the tool 
is identified as not conjoined (i.e.. the field SEPARATE-l F-CON JOINED = -F".). 
then the tool shall consider the TAQ if found as a stand-alone name segment. 
Thus, the SEPARATE-IF-CONJOINED field indicates whether the application 
program will search for a TAQ as an independent name segment as well as part of 
a name segment. 

• Conjoined processing shall determine whether a TAQ is conjoined either at the 
beginning or at the end of a name segment. Conjoined processing does not 
search for a TAQ anywhere within the name segment. 

• If the tool identifies a conjoined TAQ in a name segment; it shall: 

• create multiple segments by separating the TAQ(s) from the stem; and 

• then proceed with TAQ processing of the separated segments. 

6. There is an outstanding issue for handling the apostrophe - this issue will not be resolved 
in Version 1 .0. The issue is that in some cases, such as with a name like "O'Connor", we 
want to separate the conjoined "O' "from "Connor", and then recognize the "C " as a TAQ 
and process it as is defined in the TAQ Table (i.e.. DISREGARD or DELETE). In a case 
such as "Ol'ga", however, we simply want to delete the apostrophe to produce "Olga". For 
Version 1.0. we are defining the apostrophfe as a removal marker and mapping its 
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occurrence to "NIL"; so "Oconnor" and "Olga" will be produced for the examples cited 
above. Once the tool supports conjoined TAQs. the TAQ "O" may be recognized and 
processed as a conjoined TAQ, which would produce the desired "Connor". The current 
TAQ Table has entries for "D" 'L' and "C " as well as their counterparts. "D". "L". and 
"O" - all of which are marked SEPARATE-IF-CONJOINED. Further analysis Is required to 
determine whether It is necessary to handle the apostrophe using a different technique(s). 
If we determine that the apostrophe will always (for all foreseeable future versions of the 
tool) be removed prior to TAQ processing, then the "D' 'L' and "O' ** entries in the TAQ 
Table will no longer be necessary. 

7. Ttte tool may attempt to define the gender of a name based on the available TAQs. 

8. In later versions of the tool, additional cultural partitions may be supported in the TAQ 
Table. 

9. In the future, the tool may process TAQs differently based on culture (note that this is part 
of the justification for separating Generic from Anglo). . . 

5.1.8 Identify number of segments in name fields 

5.1.8.1 Functionalitv 

After TAQ removal, the system shall identify the number of segments in the SN and GN fields to 
assist in producing the ordered list of the top X names. 

At a minimum, the tool shall require one SN segment and one GN segment for each name. 

The tool shall accept an empty string as a single SN segment or GN segment. 

If no segment exists after TAQ removal, the tool shall create a single empty segment for the 
appropriate name field. Thus, the tool shall recognize a single empty segment to indicate no data 
after TAQ removal. 

The tool shall tag all empty segments as "unknown". 

The tool shall support up to 5 segments in both the SN and GN fields after removal of TAQs. 

If more than 5 segments remain In a name field after TAQ removal, the additional segments will be 
excluded from the evaluation process. 
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5.1.9 Identify and process Given Name Variants (Query Name Only) 

5.1.9.1 Functionality 

5.1.9.1.1 GIVEN-NAME-VARIANT Table 

The tool shall support a single GIVEN-NAME-VARIANT Table that describes the relationship between 
two Given Names based on a specified cultural perspective (GN-CULT-AFF-ID). 

The GIVEN-¥1AME-VAR1ANT Table will consist of pairs of given names within a culture that are 
determined to be variants of one another, based on their having the same name stem. In other words, 
the type of variation defined for the contents of the GIVEN-NAME-VARIANT Table are determined 
based on a specific cultural perspective (e.g.. using an english or anglo perspective, "ELENA" and 
"HELENA" are considered "similar but different" names", however, they are considered "predictable 
spelling variants" when the pair is defined using a Hispanic perspective). 

The criteria for whether or not a pair of variants will be included in the GIVEN-NAME-VARIANT Table 
will be based on the following defined types of variation: 



Variation Values 



VARIATION TYPE 


EXAMPLE 


DEFAULT 
VALUE 


*Spelling variant - predictable 


SEAN - SHAWN 


0.95 


^Abbreviation 


MARIA - MA 


0.90 


•Nickname 


FRANCISCO - PACO 


0.90 


*Same root - morph difference 


BUSTO - BUSTOS 


0.85 


Different culture (translation) 


FRANCISCO - FRANCIS 


0.85 


* Related - unpredictable difference 


BUSTO - BUSTONES 


0.80 


Truncation 


FRANCISCO - FRANCISC 


0.70 


Misspelling 


MARIA - MRAIA 


0.70 


Similar name; not same root 


SALAM - SALIM 


0.65 


Gender 


MARIA -MARIO 


• 0.50 



The items marked with a * are culture-specific variants: 

• Spelling variation may or may not be taken care of by digraph matching (for the product it 
will probably handle most reasonable variation). 

• Abbreviations and nicknames depend on the culture; many, if not most, can be taken care 
of with lists that can be Improved over time by restricting the culture relationship. 

. Same root/morphological difference Is definitely culture-specific, since the root and 

morphological elements can only be identified within a system; many of the differences (if they 
are short) can be handled with digraphs. 
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• Related/unpredictable difference is also within a cultural system; these are not the same 
name, however. Much of these differences can be handled with digraphs, too. 

• Truncation and misspelling can also be said to be culture-specific, since you have to know 
how It was spelled In the first place to know If it's misspelled or tmncated. Depending on how 
these are identified (i.e.. if we know for sure what they are a variant of), these should perhaps 
receive a high value (e.g., 0.90). 

Similar name; not same root variants will be included in the GiVEN-NAME-VARIANT Table to enable 
the tool to override a potentially high digraph score by assigning a lower variant score, if desired. 
Thus, a name that might qualify as a digraph variant, but which we do not consider a variant of the 
related name pair, would be less likely to qualify as a variant (e.g., "MUHAMAD" and "MAHMUD" may 
occur in the Jable with an associated GNV-SCORE set very low because their variation type = 
"similar but diffierent" and we would prefer not to see them appear as variants of one another). 

In Version '^ .0 of the tool, the CULT-AFF-ID will include Generic, Anglo, Arabic. Chinese. Hispanic, 
Korean, and Russian, if applicable. The table below lists the possible CULT-AFF-ID values: 









Arabic 


c 


Chinese 


E 


Anglo 


G 


Generic 


H 


Hispanic 


K 


Korean 


R 


Russian 



The GIVEN-NAME^VARIANT Table shall not be modifiable by the developer or user in Version 1.0. 

A separate data base utility will be developed to generate code representing the contents of the 
GIVEN-NAME-VARIANT Table which Is currently stored in MS Access. 



The following is a sample of the contents of the GIVEN-NAME-VARIANT Table: ; 





^liaiIGN^ARrAW«i 






1 AARON 


; AHARON 




0,95 iG 


i AARON 


iARN 


i 


0.65 ! G 


i ABRAHAM 


iABE 


i 


0.9 iG 


1 ABRAHAM 


i ABRAM 


i 


0.65 IG 


i ABRAHAM 


i AVRAHAM 


i 


0.95 |G 


1 ABRAHAM 


1 AVROM 




0.65 !G 


1 ADAM 


iADAMO 


i 


0.85 1 G 


lADAM 


iADAN 


1 


0.85 !G ! 


ADRIAN 


iADRIEN 




0.95 1 G 1 


; AGNES 


i AGGIE 




0.9 1 G 


, AGNES 


lAGNESE 


i 


0.85 IG 1 
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; AGNES 



i INES 



0.85 G 



: AHMED 



I AHMAD 



0.95 G 
0.9G 



iALAN 



|AL 



Each entry in the GIVEN-NAME-VARIANT Table shall represent a bilateral relationship, and therefore 
only one entry will be required to support these bilateral relations (e.g., there will only be one entry In 
the table to define a relationship between "ABRAHAM" and "ABE"). 

Each entry in the GIVEN-NAME-VARIANT Table will be assigned a GNV-SCORE, which is based on 
a type of variation. The variation type will not be included in the GIVEN-NAME-VARIANT Table. 

The GIVEN-NAME-VARIANT Table will not include self-relationships (i.e.. "ABRAHAM" "ABRAHAM" 

0 is not in the Table). 

5.1.9.1.2 Given Name Variant Processing 

If GivenNameCheckVariant = "T', the tool shall perform the following: 

• First, the tool shall look up each query GN segment to determine whether it is included in 
the GIVEN-NAME-VARIANT Table for the appropriate cultural perspective (GN-CULT- 
AFF-ID) as defined by the API-defined query type. 



• If the query GN segment is included in the subset of the GIVEN-NAME-VARIANT 
Table associated with the selected cultural perspective, the tool shall associate all 
of its known variants within that cultural perspective, and their variation score 
(GNV-SCORE) with the query GN segment for use In the evaluation process. 

• If the query GN segment is not included in the subset of the GIVEN-NAME- 
VARIANT Table associated with the selected cultural perspective, and the selected 
cultural perspective is not "Generic", then the tool shall look up each query GN 
segment to determine whether it is included in the "Generic" subset of the GIVEN- 
NAME-VARIANT Table. 

• If the query GN segment is included in the "Generic" subset oT the GIVEN- 
NAME-VARIANT Table, the tool shall associate all of its known variants 
within the "Generic" subset, and their variation score (GNV-SCORE) with 
the query GN segment for use in the evaluation process. 

• If the query GN segment is not Included in the "Generic" subset of the 
GIVEN-NAME-VARIANT Table, the tool shall perform no additional Given 
Name Variant processing for this GN segment. 



5.1.9.2 Design Notes 
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1 . In version 1 .0, the Anglo contents of the GIVEN-NAME-VARIANT Table are derived from 
the 389 given names that occurred at least 200 times in the State Department Passport 
Database. 

2. Hispanic Given Name Variants were gathered from the HNA variant table LAS generated 
for the State Department. 

3. Korean Given Name Variants were gathered from the data LAS generated for ORD-C. 

4. Chinese Given Name Variants were gathered from the DNC variant table LAS generated 
for the State Department 

5. Arabic Given Name Variants were gathered from the DNC variant table LAS generated for 
the State Department. 

5.1.9.3 Future Version Notes 

1 . The developer or user shall be provided mechanisms for adding new variants to the 
GIVEN-NAME^VARIANT Table. 

2. The developer or user shall not be allowed to delete variants from the GIVEN-NAME- 
VARIANT Table. 

3. The tool may attempt to determine the gender of a name based on the Given Name 
variants - this may be especially valuable for Hispanic given names. In order to do this, 
the GIVEN-NAME-VARIANT Table would be enhanced to include gender information. 

. 4. The GIVEN-NAME-VARIANT Table may support additional cultural perspectives. 

5. In the future, the tool may process GN Variants differently based on culture (note that this 
is part of the justification for separating Generic from Anglo). 

5.1.10 Identify and process Surname Variants (Query Name Only) 

5.1.10.1 Functionality 

5.1,10.1.1 SURNAME-VARIANT Table 

The tool shall support a single SURNAME-VARIANT Table that describes the relationship between 
two Surnames based on a specified cultural perspective (SN-CULT-AFF-ID). 

The SURNAME-VARIANT Table will consist of pairs of surnames within a culture ihai are determined 
to be variants of one another, based on their having the same name stem. In other words, the type of 
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variation defined for the contents of the SURNAME-VARIANT Table are determined based on a 
specific cultural perspective. 

The criteria for whether or not a pair of variants will be included In the SURNAME-VARIANT Table will ■ 
be based on the following defined types of variation: 



Variation Values 



VARIATION TYPE 


EXAMPLE 


DEFAULT 
VALUE 


•Spelling variant - predictable 


GOMEZ - GOMES 


0,95 


•Abbreviation 


GOMEZ - GOM 


0.90 


*Same root - morph difference 


BUSTO - BUSTOS 


0.85 


^Related - unpredictable difference 


BUSTO - BUSTONES 


0.80 


Tmncation 


FRANCISCO - FRANCISC 


0.70 


Misspelling 


GOMEZ - GMEZ 


0.70 


Similar name; not same root 


GOMEZ -GAMEZ 


0.65 



The items marked with a * are culture-specific variants: 



• Spelling variation may or may not be taken care of by digraph matching (for the product 
it will probably handle most reasonable variation). 

• Abbreviations and nicknames depend on the culture; many, if not most, can be taken 
care of with lists that can be improved over time by restricting the culture relationship. 

• Same root/morphological difference is definitely culture-specific, since the root and 
morphological elements can only be identified within a system; many of the differences (if 
they are short) can be handled with digraphs. 

• Related/unpredictable difference is also within a cultural system; these are not the same 
name, however. Much of these differences can be handled with digraphs, too. 

• Truncation and misspelling can also be said to be culture-specific, since you have to 
know how it was spelled in the first place to know if it's misspelled or truncated. 
Depending on how these are identified (i.e.. if we know for sure what they are a variant of), 
these should perhaps receive a high value (e.g.. 0.90). 

Variant Surnames based on morphological endings will only be included, ifihey will not be handled by 
other processing (i.e.. conjoined TAG (i.e.. suffix) removal processing or by a morphological lookup 
table that will trigger the left bias factor). In other words, we are focusing on stem variations. 

Similar name; not same root variants will be included in the SURNAME-VARIANT Table to enable the 
tool to override a potentially high digraph score by assigning a lower variant score, if desired. Thus, a 
name that might qualify as a digraph variant, but which we do not consider a variant of the related 
name pair, would be less likely to qualify as a variant. 

The following additional types of variation will not be included in the SURNAME-VARIANT Table 
(even though they are included in the GIVEN-NAME-VARIANT Table): 
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• Nicknames; 

• Different culture (translation); and 

• Gender variants. 

In Version 1 0 of the tool the CULT-AFF-ID will include Generic, Anglo. Arabic. Chinese. Hispanic, 
tr^^^an^^^^^ The table below lists the possible CULT-AFF-ID values: 







A 


Arabic 




Chinese 


E 


Anglo 


G 


Generic 


H 


Hispanic 


K 


Korean 


R 


Russian 



The SURNAME-VARIANT Table shall not be modifiable by the developer or user in Version 1 .0. 

A separate data base utility will be developed to generate code representing the contents of the 
SURNAME-VARIANT Table w/hich is currently stored in MS Access. 

The following is a sample of the contents of the SURNAME-VARIANT Table: 



.ACOSTA 


. 1 COSTA 


0.85 IH \ 


lAGUILAR 


! AGUILA 


i 0.65 !H i 


lAGUILAR 


lAGUILERA 


0.65 iH 


lAGUILERA 


: AGUILA 


0.85 iH 


lAGUILERA 


lAGUILAR 


0.65 1 H ■ 


iALBA 


1 ALBAN 


0.65 H ! 


ALCANTARA 


lALCANTAR 


: O". 85 ' H : 


ALDANA 


i ALDAMA 


0.8 H . : 


^ALMANZAR 


i ALMANZA 


0.65 H 


^ALONSO 


; ALONZO 


; 0.95 : H 


ALONZO 


iALONSO 


i 0.95 i H 


lALVARADO 


ALVARDO 


0.8 : H 


1 ALVAREZ 


.ALVARES 


0.95 >H 


i ALVAREZ 


iALVARO 


1 0.65 iH f 


; ALVAREZ 


ALVEREZ 


0.8 iH 



Each entry in the SURNAME-VARIANT Table shall represent a bilateral relationship, and therefore 
only one entry will be required to support these bilateral relations (e.g.. there will only be one entry in 
the table to define a relationship between "GOMEZ" and "GOMES'). 
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Each entry in the SURNAME-VARIANT Table will be assigned a SNV-SCORE. which is based on a 
type of variation. The variation type will not be included In the SURNAME -VARIANT Table. - 

The SURNAME-VARIANT Table will not include self-relationships (i.e.. "GOMEZ" "GOMEZ" 0 is not in 
the table). 

5.1.10.1.2 Surname Variant Processing 

\i SurnameCheckVariant = "r. the tool shall perform the following: 

• Rrst, the tool shall look up each query SN segment to determine whether it is included in 
the SURNAME-VARIANT Table for the appropriate cultural perspective (SN-CULT-AFF- 
ID), as defined by the API-defined query type. 

• If the query SN segment is included in the subset of the SURNAME-VARIANT 
Table associated with the selected cultural perspective the tool shall associate all 
of its known variants within that cultural perspective, arid their variation score 
(SNV-SCORE) with the query SN segment for use in the evaluation process. 

• If the query SN segment is not included in the subset of the SURNAME-VARIANT 
Table associated with the selected cultural perspective, and the selected cultural 
perspective is not "Generic", then the tool shall look up each query SN segment to 
determine whether it is included in the "Generic" subset of the SURNAME- 
VARIANT Table. 

• If the query SN segment is included in the "Generic" subset of the 
SURNAME-VARIANT Table, the tool shall associate all of its known 
variants within the "Generic" subset, and their variation score (SNV- 
SCORE) with the query SN segment for use in the .evaluation process. 

• If the query SN segment is not included in the "Generic" subset of the 
SURNAME-VARIANT Table, the tool shall perform no additional Surname 
Variant processing for this SN segment. 

5.1.10.2 Design Notes 

1 The type of variation defined for the Anglo contents of the SURNAME-VARIANT Table 
were determined based on different cultural perspectives using the phonetic woricbench to 
generate variants that would not be handled by a digraph search. 

2. Hispanic Surname Variants were gathered from the HNA variant table LAS generated for 
the State Department. 
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3 Korean Surname Variants were gathered from the data IAS generated for ORD-C and 
" supplemented with entries in the DNC variant table LAS generated for the State 
Department. 

4. Chinese Surname Variants were gathered from the DNC variant table LAS generated for 
the State Department. 

" ^5 1.10.3 Future Version Notes 

1 . The developer or user shall be provided mechanisms for adding new variants to the 
SURNAME-VARIANT Table. 

2. The developer or user shall not be allowed to. delete variants from the SURNAME- 
VARIANT Table. 

3. The SURNAME-VARIANT Table may support additional cultural perspectives. 

4. In the future, the tool may process SN Variants differently based on culture (note that this 
is part of the justification for separating Generic from Anglo). 

6. Evaluate and Score 

6.1 Functionality 

The tool shall compare each candidate name with the query name to determine whether the 
candidate name qualifies as a similar name. 

In order to determine whether the candidate name is similar to the query name, the tool shall: 

• Evaluate the Surname; 

• Evaluate the Given Name; 

• Determine if the SurnameScore exceeds SurnameThreshold; 

• Determine if the GivenNameScore exceeds GivenNameThreshold; and 

• then Compute a NameScore & Determine If Potential Match. 
6.1.1 Evaluate Surname 

In order to evaluate the Surname, the tool shall: 

. First Determine a SurnameSegmentScore for each possible pairing of query and candidate- 
SN segments; 
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• Then Apply SN Segment Evaluation Factors to adjust the SurnameSegmentScore 
(resulting from either a Surname Variant match, Surname Initial match, hot exist or unknown 
match, or a Surname Digraph match) by multiplying the SurnameSegmentScore according to 
a set of Surname evaluation factors; and 

• Finally, Determine a SurnameScore. 
6.1.1.1 Determine SumanneSegnnentScore 

The tool shall compare each of the candidate SN segments with the query SN segments to determine 
a SurnameSegmentScore for each pair of SN segments. 

This pairing of SN segments can be represented in an evaluation matrix, such as the one depicted 
below. 

Granier Smith 

Smyth I SurnameSegmentScorel | SurnameSegmentScore2 ] 

SurnameSegmentScorel = SurnameSegmentScore determined when comparing "Smith" and 
"Granier". 

SurnameSegmentScore2 = SurnameSegmentScore * Evaluation Factor determined when comparing 
"Smith" and "Smith". 

The tool shall determine each SurnameSegmentScore as follows: 

• First Check for Not Exist or Unknown Values (SurnameCheckUnknownNotExist, 
LastNameUnknownScorei NoLastNameScore) on the two SN segments. 

• If the two SN segments are not a Not Exist or Unknown match. Check for Surname Variant. 
Match (SurnameCheckVariant, SNV-SCORE) 

• If the two SN segments are not a Not Exist or Unknown match or Surname Variant match, 
Check for Surname Initial iVIatch (SurnameChecklnitiai, SurnamelnltiaiScore, . 
SurnameExactlnitiallVlatchScore) 

• If the two SN segments are not a Not Exist or Unknown match, or a Surname Variant match, 
or a Surname initial match, then the tool shall Perform a Surname Digraph Comparison 
(SurnameCheckBias) 

6.1.1.1.1 Check for Not Exist or Unknown Values (SurnameCheckUnknownNotExist, 
LastNameUnknownScore, NoLastNameScore) 
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If SurnameCheckUnknownNotExist = "V, then the tool shall determine whether the 
NoLastNameScore or LastNameUnknownScore can be assigned to the SurnameSegmentScore to 
handle a SN segment that does not exist or whose value is unknown. 

The following table illustrates the SurnameCheckUnknownNotExist conditions and associated values 
for setting the SurnameSegmentScore: 



comparand A 


comparand B 
; ■ known / 


. comparand B. . 
unknown v / r. 


comparand B ; 
--not exist. ru'V-cxK 


faown 


N/A 


LastNameUnknownScore 


NoLastNameScore 


unknown - " 


LastNameUnknownScore 


(LastNameUnknownScore 
+ 1)/2 


(LastNameUnknownScore 
+1)/2 


not exist 


NoLastNameScore 


(LastNameUnknownScore 
+1)/2 


(NoLastNameScore + 1)/2 



If one comparand is defined as "unknown" and the other comparand is '•known^ then the tool shall set 
the SurnameSegmentScore = LastNameUnknownScore. 

If one comparand is identified as "unknown", and the other comparand is defined as "not exist", then 
the tool shall set the SurnameSegmentScore = (LastNameUnknownScore+1)/2. 

If one comparand is defined as "known" and the other comparand is "not exist", then the tool shall set 
the SurnameSegmentScore = NoLastNameScore. 

If both comparands are identified as "unknown", then the tool shall set the SurnameSegmentScore = 

(LastNameUnknownScore+1)/2. 

If both comparands are identified as "not exist", then the tool shall set the SurnameSegmentScore = 
(NoLastNameScore+1)/2. 

6.1 .1 .1 .2 Check for Surname Variant Match (SurnameCheckVariant. SNV-SCORE) 

If SurnameCheckVariant = "T", the tool shall determine whether a SNV-SCORE can be applied. 

For every segment pairing, the tool shall determine whether the two SN segments havebeen pre- 
determined to be variants of one another (i.e.. defined in the SURNAME-VARIANT Table) by 
checking to see if the candidate SN segment is present in the list of variants associated with the 
query SN segment. 

If the candidate SN segment is present in the list of variants associated with the query SN segment 
then the tool shall set the SumameSegmentScore = SNV-SCORE associated with the query vanant. 

6.1 .1 .1 .3 Check for Surname Initial Match (SurnameChecklnitial, SurnamelnitialScore. 
SurnameExactlnitialMatchScore) 
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If SurnameChecklnitial = and the SN segment was not identified as a Su^na^me ^ the 
tool shall determine whether the SurnamelnitlalScore or SurnameExactln.tialMatchScore can be 
applied. 

If comparand A's SN segment is a single character and comparand B's SN segment is a 
single character and they match, then the tool shall set the SurnameSegmentScore = 
SurnameExactlnitialMatchScore. 

If comparand A's SN segment is a single character and comparand B's SN segment is more 
than one character and comparand A's SN segment matches the first character of comparand 
B's SN segment, the tool shall set the SumameSegmentScore = SumameinitialScore. 

6.1.1.1.4 Perform a Surname Digraph Evaluation 

A value from 0.0 to 1 .0 shall be calculated based on the number of digraphs which match between 
two SN segments. 

A digraph shall only participate in a match once. 

One point shall be awarded for each digraph that participates in a match, thus each digraph match 
shall result in exactly two points being added to the total digraph score. 

The SurnameSegmentScore shall be the total number of points assigned based on the matching 
digraphs (digraph score), divided by the number of digraphs that occur in the two SN segments. 

For example, the SN segments "Garcia'* and "Garica" are not an exact match. Of fourteen total 
digraphs involved In the evaluation, there are four matches, involving 8 digraphs. 



Query SN Segment: Garcia 
Candidate SN Segment : Garica 

#G Ga ar rc ci ia a# 
#G Ga ac cr ri la a# 
#G Ga la a# 



Therefore, the name receives a digraph score of 8/14 = .57 
6. LIJ.4.1 Apply Surname Left Digraph Bias (SurnameCheckBias) 

If SurnameCheckBias = "T" . then a bias will be applied so that digraphs on the right end of the strings 
count less than those on the left In a particular SN segment. 

If SurnameCheckBias = T , the tool shall apply bias by: 
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• First, assigning the first contributing digraph in each SN segment a weight factor of 1 .00. 
the second contributing digraph a weight factor of .9. and so on until the tenth contributing 
digraph Is reached, at which point, all remaining digraphs shall be assigned a weight factor 
of .1. 

• then determine which digraphs match between the two SN segments; 

• sum the weight factors assigned to the matched query SN digraphs with the weight factors 
.^ssjgned to the matched candidate SN digraphs; and 

• divide by the sum of all contributing digraphs for both the query SN and the candidate SN. 



SurnameCheckBias : T 

Query : Moskyovich, FNU 
Candidate : Markovich. FNU 

-M Mo OS sk ky yo ov vi ic ch h- 
(1.0) (.9) (.8)(.7)(.6) (.5) {.4)(.3)(.2) (.1)(.1) 

-M Ma ar rk ko ov vi ic ch h- 
(1.0) (.9) {.8)(.7)(.6) (.5){.4)(.3)(.2) (.1) 



The following digraphs match: 



-M ov vi ic ch h- 
Query Weight Factors: (1.0) (.4) (.3) (.2) (.1) (.1) 
Candidate Weight Factors: (1.0) (.5) (.4) (.3) (.2) (.1) 



Matched Digraphs = n .Q^-.4-H.3-K.2+U.1 W1.0+.5^.4+.3+.2-h.1) 

Total possible Digraphs (1.0+.9+.8+.7+.6+.5+.4+.3+.2+.1+.1)+(1.0+.9+.8+.7+.6+.5+.4+.3+.2+.1) 

Therefore, the SumameSegmentScore == (4.6 / 1 1 .1) = 0.41 

6.1.1.1.5 Design Notes 

1 . We may want to support a Right Bias in the future. 

6.1.1.2 AdpIv SN Segment Evaluation Factors 
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SN Segment Evaluate Factors will be applied, if appropriate, to each segment in the evaluation 
matrix. ' 

The tool shall determine whether to apply certain SN Segment Evaluation Factors in determining a 
similar name, based on the application of the following set of logical parameters: 

• Determine Relative Position of SN Segments (SurnameAnchorSegment); 

• Apply Surname Out of Position Factor (SurnameOutOfPositlonFactor); 

Apply Surname Anchor Segment Factor (SurnameAnchorSegment, 
SurnameAnchorFactor); and 

• Apply Surname TAQ Factors (SurnameCheckTAQ, 
SurnameTAQDeleteFactor, SurnameTAQDisregardFactor, 
SurnameTAQDisregardAbsentFactor, SurnameTAQDeleteAbsentFactor). 

6.1.1.2.1 Detemnlne Relative Position of SN Segments (SurnameAnchorSegment) 

In order to determine the relative position in both comparands. the tool shall establish an "index" of a 
segment based on the SurnameAnchorSegment. 

For SurnameAnchorSegment = "none" or "first", the tool shall count segments from left to right. 
For SumameAnchorSegment = "last", the tool shall count segments from right to left. 

6.1.1.2.2 Apply Surname Out of Position Factor (SumameOutOfPositionFactor) 

If a SN segment is out of position, the SumameOutOfPositionFactor will always be applied. 

If two SN segments are not in the same relative position In both comparands. the tool shall multiply 
the SurnameSegmentScore by the SumameOutOfPositionFactor. 

In the example cited below, "Smyth" and "Smith" are out of position, and thus the . 
SurnameOutOfPosition factor will be applied to SurnameSegmentScore2. . 

Granier Smith ^■ 

Smyth I SurnameSegmentScorel | SurnameSegmentScore2 | 



6. 1 . 1 .2.3 Apply Surname Anchor Segment Factor (SurnameAnchorFactor, 
SurnameAnchorSegment) 

The SurnameAnchorFactor is used to identify and emphasize the importance of one segment of the 
sumame over another if more than one SN segment exists. 
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The SurnameAnchorFactor shall never be applied if the SurnarTieMode is "average" as this would 
doubly discount the contribution of a single SN segment when determining an overall SN score. 

The SurnameAnchorFactor shall never be applied if the SurnameOutOfPositionFactor has already 
been applied, even if the SurnameAnchorSegment = "first" or "last". Thus, the SN segments must be 
in position if SurnameAnchorFactor will be applied. 

^ When the SurnameAnchorSegment = "none", neither SN segment will be assigned more weight than 
the other (i.e^.^Anchor segment is essentially turned off). When the SurnameAnchorSegment = "first", 
the first (i.e., left-most) SN segment will be assigned more weight, and when the 
SurnameAnchorSegment = "last", the last (i.e.. right-most) SN segment will be assigned more weight. 

If two SN segments are in the same relative position in both comparands (i.e., 
SurnameOutOfPositionFactor did not apply), and SurnameMode is "lowest" or "highest", and their 
position is not the SurnameAnchorSegment, and SurnameAnchorSegment = "first" or "last", then the 
tool shall multiply the SurnameSegmentScore by the SurnameAnchorFactor. 

6.1.1.2.4 Apply Surname TAQ Factors (SurnameCheckTAQ, SurnameTAQDeleteFactor, 
SurnameTAQDisregardFactor, SurnameTAQDisregardAbsentFactor, 
SurnameTAQDeleteAbsentFactor) 

6. LI. 2.4 J Functionality 

DISREGARD TAQs are viewed during evaluation as more important than DELETE TAOs. Two TAQs 
of the same type (i.e.. DELETE or DISREGARD) that do not match are viewed during evaluation as 
more important than absence of a TAQ type in one comparand and presence of that same TAQ type 
in the other comparand. 

When the SurnameCheckTAQ = "off or "remove", no TAQ processing shall take.'place during the 
evaluation process. 

When the SurnameCheckTAQ = "score", TAQs that were identified, removed, and associated with 
each relevant SN segment during preprocessing shall be factored Into the SurnameSegmentScore. 

If SurnameCheckTAQ = "score", the tool shall determine which of the following four parameters can 
be applied to the SurnameSegmentScore: 

• SurnameTAQDisregardFactor; 

• SurnameTAQDeleteFactor; 

• SurnameTAQDisregardAbsentFactor; and 

• SurnameTAQDeleteAbsentFactor. 

TAQ processing shall be performed as follows: 
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• First, determine whether the query name segment and the candidate name segment each 
have associated TAQs identified during pre-processing. 

• If no TAQs are associated with either segment, then the tool shall not adjust the 
SurnameSegmentScore (i.e., DELETE TAQs None Occur, DISREGARD TAQs None 
Occur). 

• If^ne comparand has all DELETE TAQs associated with it and the other comparand has 
no TAQs associated with it, then the SurnameSegmentScore will be multiplied by the 
SurnameTAQDeleteAbsentFactor (i.e., DELETE TAQs Absent. DISREGARD TAQs None 

Occur). 

• If one or more DISREGARD TAQs are associated with one comparand and not the other 
(i.e.. either the query name segment or the candidate name segment has one or more 
DISREGARD TAQs), then the tool shall apply the SurnameTAQDisregardAbsentFactor 
(i.e., DELETE TAQs >=1 Match, No Match, Absent. None Occur, DISREGARD TAQs 
Absent). 

• If DISREGARD TAQs are present in both comparands. then the tool shall determine 
whether there are any matches on any of the DISREGARD TAQs: 

• If any matches are found, then the tool shall determine if there are any matches on 
any DELETE TAQs. 

• If no DELETE TAQs are present, then the tool shall not adjust the 
SurnameSegmentScore (i.e.. DELETE TAQs None Occur. DISREGARD 
TAQs >=1 Match). 

• If DELETE TAQs are present in both comparands and no matches are 
found, then the SurnameSegmentScore will be multiplied by the 
SurnameTAQDeleteFactor (i.e.. DELETE TAQs No Match. DISREGARD 
TAQs >=1 Match). 

./ 

• If DELETE TAQs are present in both comparands and a match is found, 
then the tool shall not adjust the SurnameSegmentScore (i.e., DELETE 
TAQs >=1 Match, DISREGARD TAQs >=1 Match). 

• If DELETE TAQs are present in one comparand, but not the other, then the 
SurnameSegmentScore will be multiplied by the 
SurnameTAQDeleteAbsentFactor (i.e.. DELETE TAQs Absent. 
DISREGARD TAQs >=1 Match). 

• If no match is found, then the SurnameSegmentScore will be multiplied by the 
SurnameTAQDisregardFactor(i.e., DELETE TAQs >=1 Match, No Match, Absent. • 
None Occur, DISREGARD TAQs No Match). 
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• If both comparands have all DELETE TAQs associated with them, the tool shall determine 
if any of the DELETE TAQs match: 

• If there is any match, then the tool shall not adjust the SurnameSegmentScore 
(i.e.. DELETE TAQs >=1 Match DISREGARD TAQs None Occur). 

• If there is no match, then the SurnameSegmentScore will be multiplied by the 
SurnameTAQDeleteFactor (i.e.. DELETE TAQs No Match DISREGARD TAQs 

■^■-^^ - - None Occur). 

The following table describes the conditions governing the application of TAQ parameters as 
described in the text above: 
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DELETE TAQ{s) 


DISREGARD TAQ(8) 


- Impact on 


SurnameSegmentScore 


None Occur 


None Occur 


No Change 


None Occur 


No Match 


Apply SurnameTAQDisregardFactor 


None Occur 


Absent 


Apply SurnameTAQDisreqardAbsentFactor 


None Occur 


>=1 Match 


No Change 


Nojyiatch 


None Occur 


Apply SumameTAQDeleteFactor 


No Match 


No Match 


Apply SurnameTAQDisregardFactor 


No Match 


Absent 


Apply SurnameTAQDisregardAbsentFactor 


No Match 


>=1 Match 


Apply SumameTAQDeleteFactor 


Absent 


None Occur 


Apply SurnameTAQDeleteAbsentFactor 


Absent 


No Match 


Apply SurnameTAQDisregardFactor 


Absent 


Absent 


Apply SurnameTAQDisregardAbsentFactor 


Absent 


>=1 Match 


Apply SurnameTAQDeleteAbsentFactor 


>=1 Match 


None Occur 


No Change 


>=1 Match 


No Match 


Apply SurnameTAQDisregardFactor 


>=1 Match 


Absent 


Apply SurnameTAQDisregardAbsentFactor 


>=1 Match 


>= 1 Match 


No Change 



"Match" in the table indicates that the stated TAQ Type occurs in both comparands and at 
least one of the TAQ Type values occurs in both comparands. i.e., the TAQ Type values are 
the same. For example, the DELETE TAQ value "Mr" may occur in both comparands. 



"No Match" in the table indicates that the stated TAQ Type occurs in both comparands but 
none of the TAQ Type values occurs in both comparands, i.e.. the values are not the same. 
For example, the single DELETE TAQ value "Mr" may occur In one comparand and the single 
DELETE TAQ value "Mrs" may occur in the other comparand. 

"Absent" in the table indicates that the stated TAQ Type is absent in one of the comparands 
but occurs in the other comparand. For example, the DELETE TAQ value "Mr^ may occur in 
one comparand and there may be no DELETE TAQ value at all in the other.comparand. 

"None Occur" in the table indicates that the stated TAQ Type does not occur in either 
comparand. For example, no DELETE TAQs occur in either comparand. 

6.1.1.2.4.2 Future Version Notes 

• The SurnameTAQDisregardFactor will be replaced by the highest TAQ-DISREGARD- 
WEIGHT for each DISREGARD TAQ relationship that is defined in a TAQ-DISREGARD- 
WEIGHT Table. The TAQ-DISREGARD-WEIGHT Table defines a weighted relationship 
between two TAQs that occur within a specified cultural boundary or partition. There may 
... be a default TAQ-DISREGARD-WEIGHT established so that only those TAQ relationships 
that warrant special weighting may be entered into the TAQ-DISREGARD-WEIGHT Table. 
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• When the TAQ-DISREGARD-WEIGHT Table is implemented, the importance of 
DISREGARD versus DELETE TAQs may change. For example, MR and MRS are 
currently defined as DELETE TAQs. and therefore considered less important than other 
TAQ values. However, their relationship to one another may be treated with more 
significance in later versions due to gender specification. 

. 6.1.1.3 Determine SurnameScore 

In order to determine the SurnameScore. the tool shall perform the following: 

• Compute the Highest SurnameSegmentScore(s) |SurnameMode="Highest") 

• Compute the Best Combination of SurnameSegmentScore(s) 
{SurnameMode="Average") 

• Compute the Lowest SurnameSegmentScore(s) (SurnameMode="Lowest") 

If either comparand has just one SN segment, then the tool shall set the SurnameScore = the Highest 
(Best) SurnameSegmentScore found In the evaluation matrix. 

If more than one segment occurs in both surnames, then the tool shall Apply the Surname Mode In 
its determination of the SurnameScore. 

6.1.1.3.1 Compute Highest SurnameSegmentScore(s) (SumameMode=''Hlghest") 

The tool shall compute the highest set of SurnameSegmentScores (includes the highest 
SurnameSegmentScores) from the evaluation matrix of scores. 

During the evaluation of the matrix, a given row or column shall contribute one and only one score. 

The tool shall select the combination of matrix values (with no row or column being used more than 
once) that includes the highest set of SurnameSegmentScores. 

In the following example, the highest SurnameSegmentScores will be 1.0 and .57, since 1.0 is the 
highest SurnameSegmentScore. 



Garcia Garza 



.57 


.62 


.62 


1.0 



In the following example, the highest SurnameSegmentScores will be 1.0 and .62, since 1.0 is the 
highest SurnameSegmentScore. and .62 is the next highest SurnameSegmentScore that Is in a 
different row and column combination In the matrix. 

Garcia ' Garza Garza 
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Garica 
Garza 



.57 


.62 


.62' 


.62 


1.0 


1.0 



6.1.1.3.2 Compute Best Combination of SurnameSegmentScore(s) (SurnameMode="Average") 

The tool shall compute the best possible combination of scores (I.e.. the Highest Average of 
SurnameSegmentScores) from the evaluation matrix of scores. 

During the evaluation of the matrix, a given row or column shall contribute one and only one score. 

The tool shall select the combination of matrix values (with no row or column being used more than 
once) that gives the highest sum. 

In the following example, the best combination of SurnameSegmentScores will be 1 .0 and .57, since 
{(1.0+.57)/2) > ((.62+.62)/2) = .79 > .62. 



Garcia 



Garza 



Garica 
Garza 



.57 


.62 


.62 


1.0 



6.1.1.3.3 Compute Lowest SurnameSegmentScore(s) (SurnameMode^Towesf) 

The tool shall compute the lowest set of SurnameSegmentScores (includes the Lowest 
SurnameSegmentScores) from the evaluation matrix of scores. 

During the evaluation of the matrix, a given row or column shall contribute one and only one score. 

The tool shall select the combination of matrix values (with no row or column being used more than 
once) that includes the lowest set of SurnameSegmentScores. 

In the following example, the lowest SurnameSegmentScores wilt be .57 and 1.0, since. .57 is the 
lowest SurnameSegmentScore. 



Garcia 



Garica 
Garza 



Garza 



.57 


.62. 


.62 


1.0 



In the following example, the lowest SurnameSegmentScores will be .57 and 1.0, since .57 is the 
lowest SurnameSegmentScore, and 1 .0 is the next lowest SurnameSegmentScore that is in a 
different row and column combination in the matrix. 



Garica 



Garcia 
.57 



Garza 
.62 



Garza 
.62 
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6.1.1.3.4 Apply Surname Mode (SumameMode) 

If SumameMode = "highest", the tool shall set the SurnameScore = the highest 
.SumameSegmentScore found in the evaluation matrix. 

if SumameMode = "average", the tool shall set the SurnameScore = the average of the 
SurnameSegmentScores found in the evaluation matrix.. 

If SumameMode = "low/est". the tool shall set the SurnameScore = the lowest 
SumameSegmentScore found in the evaluation matrix. 

6.1.1.3.5 Determine SurnameCompressedScore (SumameCheckCompressed. 
SurnameCompressedScore) 

This function handles names that are essentially the same name but are segmented differently (e.g.. 
"de la Garcia" "delaGarcia"). 

If SumameCheckCompressed = T. then the tool shall generate a SurnameCompressedScore in the 
following manner: 

• Create query and candidate Compressed SN fields from their original SN fields, by 
processing segmentation and removal markers, and then eliminating all remaining blanks. 

• Compare the query and candidate COMPRESSED SN fields to determine if there is an 
exact match. 

• If there is an exact match of the COMPRESSED SN fields, the tool shall set the 
SurnameScore to the higher score of the SurnameCompressedScore or the previously 
calculated SurnameScore. 



6.1.2 Evaluate Given Name 



In order to evaluate the Given Name, the tool shall: 

• First Determine a GivenNameSegmentScore for each possible combination of query and 
candidate GN segments; 

• Then, Apply GN Segment Evaluation Factors to adjust the GivenNameSegmentScore 
(resulting from either a Given Name Variant match, Given Name Initial match, not exist or 
unknown match, or a Given Name Digraph match) by multiplying the 
GivenNameSegmentScore according to a set of Given Name evaluation factors. 
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• Finally. Determine a GivenNameScore. 

6.1.2.1 Determine GivenNameSeamentScore 

The tool shall compare each of the candidate GN segments with the query GN segments to 
determine a GivenNameSegmentScore for each pair of GN segments. 

,Jhis pairing of GN segments can be represented in an evaluation matrix similar to the one described 
in the section. Determine SurnameSegmentScore. 

The tool shall determine each GivenNameSegmentScore as follows: 

• First Check for Not Exist or Unknown Values (GivenNameCheckUnknownNotExIst, 
FirstNameUnknownScore, NoFirstNameScore) on the two GN segments. 

• If the two GN segments are not a Not Exist or Unknown match. Check for Given Name 
Variant IVIatch (GivenNameCheckVariant, GNV-SCORE) 

• If the two GN segments are not a Not Exist or Unknown match or Given Name Variant 
match, Check for Given Name Initial Match (GivenNameChecklnitial, 
GivenNamelnitialScore, GivenNameExactlnitialMatchScore) 

• If the two GN segments are not a Not Exist or Unknown match, or a Given Name Variant 
match, or a Given Name Initial match, then the tool shall Perform a Given Name Digraph 
Comparison (GivenNameCheckBias) 

6.1.2.1.1 Check for Not Exist or Unknown Values (GivenNameCheckUnknownNotExist, 
FirstNameUnknownScore, NoFirstNameScore) 

If GivenNameCheckUnknownNotExist = "F . then the tool shall determine whether the 
NoFirstNameScore or FirstNameUnknownScore can be assigned to the GivenNameSegmentScore to 
handle a GN segment that does not exist or whose value is unknown. 



The following table illustrates the GivenNameCheckUnknownNotExist conditions and associated 
values for setting the GivenNameSegmentScore: 



comparand.A 


comparand B 
known 


comparand B 
unknown 


comparand B 
not exist 


known 


N/A 


FirstNameUnknownScore 


NoFirstNameScore 


unknown 


FirstNameUnknownScore 


(FirstNameUnknownScore 
+ 1)/2 


(FirstNameUnknownScore 
+1)/2 


not exist 


NoFirstNameScore 


(FirstNameUnknownScore 
+1)/2 


(NoFirstNameScore + 1)/2 
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If one comparand is defined as "unknown" and the other comparand is "known", then the tool shall set 
the GivenNameSegmentScore = FirstNameUnknownScore. 

If one comparand Is defined as "known" and the other comparand is "not exist", then the tool shall set 
the GivenNameSegmentScore = NoFirstNameScore. 

•Mf both comparands are identified as "not exist" or both are identified as "unknown", then the tool shall 
set the GivenNameSegmentScore = (FirstNameUnknownScore + 1)/2. 

If both comparands are identified as either "not exist" or "unknown", and the comparands are not 
defined the same, then the tool shall set the GivenNameSegmentScore = 
(FirstNameUnknownScore+1)/2. 

If both comparands are identified as "unknown", then the tool shall set the GivenNameSegmentScore 
= (FirstNameUnknownScore+1)/2. 

If both comparands are identified as "not exist", then the tool shall set the GivenNameSegmentScore 
= (NoFirstNameScore+1 )/2. 

6.1.2.1.2 Check for Given Name Variant Match (GivenNameCheckVariant, GNV-SCORE) 

If GivenNameCheckVariant = "F, the tool shall determine whether the GNV-SCORE can be applied. 

For every segment pairing, the tool shall determine whether the two GN segments have been pre- 
determined to be variants of one another (i.e.. defined in the GIVEN-NAME-VARIANT Table) by 
checking to see if the candidate GN segment is present in the list of variants associated with the 
query GN segment. 

If the candidate GN segment is present in the list of variants associated with the query GN segment, 
then the tool shall set the GivenNameSegmentScore = GNV-SCORE associated with the query 
variant. 

6.1.2.1.3 Check for Given Name Initial Match (GivenNameChecklnitial. GivenNamelnitialScore. 
GivenNameExactlnitialMatchScore) 

If GivenNameChecklnitial = "T" and the GN segment was not identified as a Given Name Variant, 
then the tool shall determine whether the GivenNamelnitialScore or 
GIvenNameExactlnitialMatchScore can be applied. 

If comparand A's GN segment is a single character and comparand B's GN segment is a 
single character and they match, then the tool shall set the GivenNameSegmentScore - 
GivenNameExactlnitialMatchScore. 

If comparand A's GN segment is a single character and comparand B's GN segment is more 
than'one character and comparand A's GN segment matches the first character of comparand 
B's GN segment, the tool shall set the GivenNameSegmentScore = GivenNamelnitialScore. 
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6.1.2.1.4 Perform Given Name Digraph Evaluation 

A value from 0.0 to 1 .0 shall be calculated based on the4iumber of digraphs which match between 
two Given Name segments. 

A digraph shall only participate in a match once. 

- One point shall be awarded for each digraph that participates in a match, thus each digraph match 
* shall result in exactly two points being added to the total digraph score. 

The GivenNameSegmentScore shall be the total number of points assigned based on the matching 
digraphs (digraph score), divided by the number of digraphs that occur in the two GN segments. 

6. 1.2. J. 4. 1 Apply Given Name Left Digraph Bias (GivenNameCheckBias) 

If GivenNameCheckBias = T, then a bias will be applied so that digraphs on the right end of the 
strings count less than those on the left in a particular GN segment. 

If GivenNameCheckBias = "T, the tool shall apply bias by: 

• First, assigning the first contributing digraph in each GN segment a weight factor of 1.00. 
the second contributing digraph a weight factor of .9. and so on until the tenth contributing 
digraph Is reached, at which point, all remaining digraphs shall be assigned a weight factor 
of.1. 

• then determine which digraphs match between the two GN segments; 

• sum the weight factors assigned to the matched query GN digraphs with the weight factors 
assigned to the matched candidate GN digraphs; and 

• divide by the sum of all contributing digraphs for both the query GN and the candidate GN. 
6.1.2.1.5 Design Notes 

1. We may want to support a Right Bias In the future. 
6.1.2.2 Apply GN Segment Evaluation Factors 

The tool shall determine whether to apply certain GN Segment Evaluation Factors in determining a 
similar name, based on the application of the following set of logical parameters: 

• Determine Relative Position of GN Segments (GivenNameAnchorSegment); 

• Apply Given Name Out of Position Factor (GivenNameOutOfPositionFactor); 
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• Apply Given Name Anchor Segment Factor (GivenNameAnchorSegment. 
GivenNameAnchorFactor); and 

• Apply Given Name TAQ Factors (GivenNameCheckTAQ. GIvenNameTAQDeleteFactor, 
GivenNameTAQDisregardFactor, GivenNameTAQDIsregardAbsentFactor, 
GivenNameTAQDeleteAbsentFactor). 

ill ,2.2.1 Determine Relative Position of GN Segments (GivenNameAnchorSegment) 

In order to determine the relative position in both comparands, the tool shall establish an "Index" of a 
segment based on the GivenNameAnchorSegment. 

For GivenNameAnchorSegment = "none" or Tirst". the tool shall count segments from left to right. 
For GivenNameAnchorSegment = "last", the tool shall count segments from right to left. 
6.1.2.2.2 Apply Given Name Out of Position Factor (GivenNameOutOfPositionFactor) 
If a GN segment is out of position, the GivenNameOutOfPositionFactor will always be applied. 

If two GN segments are not in the same relative position in both comparands, the tool shall multiply 
the GivenNameSegmentScore by the GivenNameOutOfPositionFactor. 

In the example cited below, "Jeffrey" and "Jeffrey" are out of position, and thus the 
SurnameOutOfPosition factor will be applied to SurnameSegmentScore3. 



Jeffrey Andrew 



SurnameSegmentScorel 


SurnameSegmentScore2 


SurnameSegmentScoreS 


SurnameSegmentScore4 



6.1.2.2.3 Apply Given Name Anchor Segment Factor (GivenNameAnchorFactor. 
GivenNameAnchorSegment) 

The GivenNameAnchorFactor is used to identify and emphasize the importance of one segment of 
the given name over another if more than one GN segment exists. 

The GivenNameAnchorFactor shall never be applied if the GivenNameMode is "average" as this 
would doubly discount the contribution of a single GN segment when determining an overall GN 
score. 

The GivenNameAnchorFactor shall never be applied if the GivenNameOutOfPositionFactor has 
already been applied, even if the GivenNameAnchorSegment = "first" or "last". Thus, the GN 
segments must be in position if GivenNameAnchorFactor will be applied. 
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When the GivenNameAnchorSegment = "none", neither GN segment will be assigned rnore weight 
than the other (i.e.. Anchor segment is essentially turned off). When the GivenNameAnchorSegment 
= "first" the first (i.e.. left-most) GN segment will be assigned more weight, and when the 
GivenNameAnchorSegment = "last", the last (i.e.. right-most) GN segment will be assigned more 
weight. 

If two GN segments are in the same relative position in both comparands (i.e.. 
' GivenNameOutOfPositionFactor did not apply), and their position is not the 

"GivenNameAnchorSegment. and GivenNameAnchorSegment = "first" or "last", and GivenNameMode 
is "lowest" or' "highest", then the tool shall multiply the GivenNameSegmentScore by the 
GivenNameAnchorFactor. 

6.1.2.2.4 Apply Given Name TAQ Factors (GivenNameCheckTAQ, GivenNameTAQDeleteFactor. 
GivenNameTAQDisregardFactor, GivenNameTAQDisregardAbsentFactor, 
GivenNameTAQDeleteAbsentFactor) 

6.L2.2.4.1 Functionality 

DISREGARD TAQs are viewed as more important than DELETE TAQs. Two TAQs of the same type 
(i.e., DELETE or DISREGARD) that do not match are viewed as more important than absence of a 
TAQ type in one comparand and presence of that same TAQ type in the other comparand. 

When the GivenNameCheckTAQ = "or or "remove", no TAQ processing will take place during the 
evaluation process. 

When the GivenNameCheckTAQ = "score", TAQs that were identified, removed, and associated with 
each relevant GN segment during preprocessing will be factored into the GivenNameSegmentScore. 

If GivenNameCheckTAQ = "score", the tool shall determine which of the following four parameters 
can be applied to the GivenNameSegmentScore: 

• GivenNameTAQDisregardFactor; 

• GivenNameTAQDeleteFactor; 

• GivenNameTAQDIsregardAbsentFactor; and 

• -GivenNameTAQDeleteAbsentFactor. 
TAQ processing shall be performed as follows: 

• First, determine whether the query name segment and the candidate name segment each 
have associated TAQs identified during pre-processing. 
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. If no TAQs are associated with either segment, then the tool shall not adjust the 

GrvenNameSegmentScore (i.e.. DELETE TAQs None Occur. DISREGARD TAQs None 
Occur). 

. If one comparand has all DELETE TAQs associated with it and ^^^f^^'^Xtit^S^e 
no TAQs associated with it. then the GivenNameSegmentScore muUiphed by 
SvenNameTAQDeleteAbsentFactor (i.e.. DELETE TAQs Absent. DISREGARD TAQs 
None Occur). 

. • Ifione or more DISREGARD TAQs are associated with one comparand and not the other 
(i^e either the query name segment or the candidate name segment has one or more 
lilSREGARD TAQsT then the tool shall apply the GivenNameTAQD^r^^^^^^^^^ 
(i.e.. DELETE TAQs >=1 Match. No Match. Absent. None Occur. DISREGARD TAQs 
Absent). 

. If DISREGARD TAQs are present In both comparands. then the tool shall determine 
whether there are any matches on any of the DISREGARD TAQs: 

. If any matches are found, then the tool shall determine if there are any matches on 
any DELETE TAQs. 

• If no DELETE TAQs are present, then the tool shall not adjust the 
GivenNameSegmentScore (i.e.. DELETE TAQs None Occur. DISREGARD 
TAQs >=1 Match). 

• If DELETE TAQs are present in both comparands and no matches are 
found then the GivenNameSegmentScore will be multiplied by the 
GivenNameTAQDeleteFactor (i.e., DELETE TAQs No Match. DISREGARD 
TAQs >=1 Match). 

• If DELETE TAQs are present in both comparands and a match is found, 
then the tool shall not adjust the GivenNameSegmentScore (i.e.,.DELETE 
TAQs >=1 Match, DISREGARD TAQs >=1 Match). 

• If DELETE TAQs are present in one comparand, but not the other, then the 
GivenNameSegmentScore will be multiplied by the 
GivenNameTAQDeleteAbsentFactor (i.e., DELETE TAQs Absent. 
DISREGARD TAQs >=1 Match). 

. If no match is found, then the GivenNameSegmentScore will be multiplied by the 
GivenNameTAQDisregardFactor (i.e., DELETE TAQs >=1 Match. No Match. Absent, 
None Occur, DISREGARD TAQs No Match). 

• If both comparands have all DELETE TAQs associated with them, the tool shall determine 
If any of the DELETE TAQs match: 
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• If there is any match, then the tool shall not adjust the GivenNameSegmentScore 
(i.e.. DELETE TAQs >=1 Match. DISREGARD TAQs None Occur). 

• If there is no match, then the GivenNameSegmentScore will be multiplied by the 
GivenNameTAQDeleteFactor (i.e., DELETE TAQs No Match. DISREGARD TAQs 
None Occur). 



The following table describes the conditions governing the application of TAQ as described in the 

above: 



DELETE TAQ(s) . 


DISREGARD TAQ(s) 


Impact on ^/--^^i 
GivenNameSegmentScore . .. 


None Occur 


None Occur 


No Change 


None Occur 


No Match 


Apply GIvenNameTAQDisregardFactor 


None Occur 


Absent 


Apply GivenNameTAQDisregardAbsentFactor 


None Occur 


>=1 Match 


No Change 


No Match 


None Occur 


Apply GivenNameTAQDeleteFactor 


No Match 


No Match 


Apply GIvenNameTAQDisregardFactor 


No Match 


Absent 


Apply GivenNameTAQDisregardAbsentFactor 


No Match 


>=1 Match 


Apply GivenNameTAQDeleteFactor 


Absent 


None Occur 


Apply GivenNameTAQDeleteAbsentFactor 


Absent 


No Match 


Apply GIvenNameTAQDisregardFactor 


Absent 


Absent 


Apply GivenNameTAQDisregardAbsentFactor 


Absent 


>=1 Match 


Apply GivenNameTAQDeleteAbsentFactor 


>=1 Match 


None Occur 


No Change 


>=1 Match 


No Match 


Apply GivenNameTAQDisregardFactor 


>=1 Match 


Absent 


Apply GivenNameTAQDisregardAbsentFactor 


>=1 Match 


>= 1 Match 


No Change 



"Match" in the table indicates that the stated TAQ Type occurs in both comparands and at 
least one of the TAQ Type values occurs in both comparands, i.e.. the TAQ Type values are 
the same. For example, the DELETE TAQ value "Mr" may occur in both comparands. 

"No Match" in the table indicates that the stated TAQ Type occurs in both comparands but 
none of the TAQ Type values occurs in both comparands, i.e., the values are not the same. 
. For example, the single DELETE TAQ value "Mr" may occur in one comparand and the single 
DELETE TAQ value "Mrs" may occur in the other comparand. 

"Absent" in the table indicates that the stated TAQ Type is absent in one of the comparands 
but occurs in the other comparand. For example, the DELETE TAQ value "Mr" may occur in 
one comparand and there may be no DELETE TAQ value at all in the other comparand. 

"None Occur" in the table indicates that the stated TAQ Type does not occur in either 
comparand. For example, no DELETE TAQs occur In either comparand. 
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6.1.2.2.4.2 Future Version Notes 

• The GivenNameTAQDisregardFactor will be replaced by the highest TAQ-D IS REGARD- 
WEIGHT for each DISREGARD TAQ relationship that is defined in a TAQ-DISREGARD- 
WEIGHT Table. The TAQ-DISREGARD-WEIGHT Table defines a weighted relationship 
between two TAQs that occur within a specified cultural boundary or partition. There may 
be a default TAQ-DISREGARD-WEIGHT established so that only those TAQ relationships 
that wan-ant special weighting may be entered into the TAQ-DISREGARD-WEIGHT Table. 

6.1.2.3 Determine GivenNameScore 

In order to determine the GivenNameScore. the tool shall: 

• Compute the Highest GivenNameSegmentScore(s) {SurnameMode="Hlgh9st"); 

• Compute the Best Combination of GlvenNameSegmentScore(s) 
(SurnameMode="Average"); and 

• Compute the Lowest GlvenNameSegmentScore(s) (SurnameMode="Lowest"); 

If either comparand has just one GN segment, then the tool shall set the GivenNameScore = the 
Highest (Best) GivenNameSegmentScore found in the evaluation matrix. 

If more than one segment occurs in both Given Names, then the tool shall Apply the Given Name 
Mode in its determination of the GivenNameScore. 

6.1.2.3.1 Compute Highest GivenNameSegmentScore(s) (GivenNameMode="Highesf) 

The tool shall compute the highest set of GivenNameSegmentScores (includes the.highest 
GivenNameSegmentScores) from the evaluation matrix of scores. 

During the evaluation of the matrix, a given row or column shall contribute one and only one score. 

The tool shall select the combination of matrix values (with no row or column being used more than 
once) that includes the highest set of GivenNameSegmentScores. 

In the following example, the highest GivenNameSegmentScores will be 1.0 and .57, since 1.0 is the 
highest GivenNameSegmentScore. 



Garcia Garza 



.57 


.62 


.62 


1.0 
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In the following example, the highest GivenNameSegmentScores will be 1.0 and .62, since 1.0 is the 
highest GivenNameSegmentScore, and .62 is the next highest GivenNameSegmentScore that is in a 
different row and column combination in the matrix. 



Garica 
Garza 



1 


.62 


.62 


1 .62 


1.0 


1.0 



6.1.2.3.2 Compute Best Combination of GivenNameSegmentScore(s) (GivenNameMode-"Average") 

The tool shall compute the best possible combination of scores (i.e., the Highest Average of 
GivenNameSegmentScores) from the evaluation matrix of scores. 

During the evaluation of the matrix, a given row or column shall contribute one and only one score. 

The tool shall select the combination of matrix values (with no row or column being used more than 
once) that gives the highest sum. 

In the following example, the best combination of GivenNameSegmentScores will be 1,0 and .57, 
since ((1 .0+.57)/2) > ((.62+.62)/2) = .79 > .62. 



Garica 
Garza 



.57 


.62 


.62 


1.0 



6.1.2.3.3 Compute Lowest GivenNameSegmentScore(s) (GivenNameMode="Lowest") 

The tool shall compute the lowest set of GivenNameSegmentScores (includes the Lowest 
GivenNameSegmentScores) from the evaluation matrix of scores. 

During the evaluation of the matrix, a given row or column shall contribute one and only one score. 

The tool shall select the combination of matrix values (with no row or column being used more than 
once) that includes the lowest set of GivenNameSegmentScores. 

In the following example, the lowest GivenNameSegmentScores will be .57 and 1.0, since .57 is the 
lowest GivenNameSegmentScore. 



Garica 
Garza 



.57 


.62 


.62 


1.0 
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In the following example, the lowest GivenNameSegmentScores will be .57 and 1.0. since .57 is the 
lowest GIvenNameSegmentScore. and l,p..is the next lowest GivenNameSegmentScore that Is in a 
different row and column combination in the matrix. 



Garcia 



Garza 



Garza 



Garica 
'Garza 



.57 


.62 


.62 


.62 


1.0 


1.0 



6.1.2.3.4 Apply Given Name Mode (GivenNameMode) 

If GivenNameMode = "highest", the tool shall set the GivenNameScore = the highest 
GivenNameSegmentScore found in the evaluation matrix. 

If GivenNameMode = "average", the tool shall set the GivenNameScore = the average of the 
GivenNameSegmentScores found in the evaluation matrix. 

If GivenNameMode = "lowest", the tool shall set the GivenNameScore = the lowest 
GivenNameSegmentScore found in the evaluation matrix. 

6.1.2.3.5 Determine GivenNameCompressedScore (GivenNameCheckCompressed. 
GivenNameCompressedScore) 

This function handles names that are essentially the same name but are segmented differently (e.g.. 
"Anne Marie" "AnneMarie"). 

If GivenNameCheckCompressed = "T, then the tool shall generate a GivenNameCompressedScore 
in the following manner: 

• Create query and candidate Compressed GN fields from their original GN fields, by 
processing segmentation and removal mariners, and then eliminating ail remaining blanks. 

• Compare the query and candidate COMPRESSED GN fields to determine if there is an 
exact match. 

• If there is an exact match of the COMPRESSED GN fields, the tool shall set the 
GivenNameScore to the higher score of the GivenNameCompressedScore or the 
previously calculated GivenNameScore. 

6.1.3 Deterniine if SurnameScore exceeds SurnameThreshold 

The tool shall determine whether the candidate name is a potential match by checking to see if the 
SurnameScore exceeds the SurnameThreshold prior to returning the candidate name as a potential 
match. 
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If the SurnameScore exceeds the SurnameThreshold. then the tool shall determine that the candidate 
name is still a potential match. — - 

If the SurnameScore does not exceed the SurnameThreshold. then the tool shall determine that the 
candidate name is no longer a potential match. 

-Even if the candidate name is no longer considered a potential match, the tool shall continue 
processing in the event that all evaluated names were requested to be scored and returned marked 
as match or no match. 

6.1.4 Determine if GiveNameScore exceeds GivenNameThreshold 

The tool shall determine whether the candidate name is a potential match by checking to see if the 
GivenNameScore exceeds the GivenNameThreshold prior to returning the candidate name as a 
potential match. 

If the GivenNameScore exceeds the GivenNameThreshold, then the tool shall determine that the 
candidate name Is still a potential match. 

If the GivenNameScore does not exceed the GivenNameThreshold, then the tool shall determine that 
the candidate name is no longer a potential match. 

Even if the candidate name is no longer considered a potential match, the tool shall continue 
processing in the event that all evaluated names were requested to be scored and returned marked 
as match or no match. 

6.1.5 Compute NameScore & Determine If Potential Match (NameThreshold, 
SurnameWeight, GivenNameWeight) 

If the SurnameWeight = GivenNameWeight = 0. then the tool shall set NameScore = 0. 

if the SurnameWeight = GivenNameWeight <> 0. then the tool shall assign NameScore = 
(SurnameScore + GivenNameScore V2. 

If the SurnameWeight <> GivenNameWeight then the tool shall assign 

NameScore = (SurnameScore*SurnameWeiQht) + (GivenName$ core*GivenNameWeiqht) 

(SurnameWeight + GivenNameWeight) 

The tool shall then determine whether the candidate name is a potential match by checking to see if 
the NameScore exceeds the NameThreshold prior to returning the candidate name as a potential 

match. 

If the NameScore exceeds the NameThreshold. then the tool shall determine that the candidate name 
is still a potential match. 
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Developers will be able to establish their own method to determine if a candidate name is a potential 
match. This will enable developers to integrate other data elements and other criteria in the final 
name score, if desired. Note that developers may or may not choose to utilize the SumameWeight 
and GivenNameWeight factors in their method. 

The tool shall populate the Results List with a candidate name based on whether it is identified as a 
potential match. If no Results List is being constructed, then the tool shall return the candidate name 
and its associated scores. 

6.2 Design Notes 

1. We considered performing an exact match on the name prior to performing the "fuzzy" 
matching, but decided that the overhead of checking every candidate name as an exact match 
was more than the cost of performing the "fuzzy" match when there is an exact match - our 
assumption Is that the tool will be evaluating more similar names than exact match names. 

2. The following parameters have been supported by eariier versions of DNC and are not 
proposed to be included in the tool. 

• The following parameters were used with the DP2-based pass-1 search for DNC. and 
were supplanted by the COF processor and, therefore, are now no longer functional in 
DNC: 

• KICKOUT 

• PARTITION 

. TEST VALUE 
. PROXRETURN 

• The following parameters were specifically supported in DNC for the State 
Department: 

. REFULEVx (REFULEVO - REFULEV4) 

• DOBFACTOR 

• FIXLASTSEG 

• CHKSUBSTR - (DNC does not use except for 1 COB) 
. SUBSCORE - (DNC does not use) 

. MINSEGLEN- (DNC does not use) 

• LTRIGRAPH - (related to COF processing) 

• NTRIGRAPH - (related to COF processing) 
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7. Produce and Manage Results 
7.1 Functionality 

7.1.1 Define Criteria for Results List 

- 7-1.1.1 Functionality 

The Results-list is defined as either an unordered candidate name list, or an ordered candidate name 
list ordered by the relatiye probability that each candidate name is similar to a specified query name. 
This probability is represented by the final NameScore. The Results List may include both similar and 
dissimilar names, depending on the criteria for defining the results. 

When a set of candidate names Is to be evaluated, the toot shall enable the developer to define the 
criteria for producing and managing their own Results List. These criteria, shall include: 

• establishing a type of Results List : 

• 1 = an unordered list of all candidate names whose name score exceeds a pre- 
defined name threshold (e.g.. if the threshold = 0. all candidate names will be 
returned in an unordered list): 

• 2 = an ordered list of all candidate names whose name score exceeds a pre- 
defined name threshold (e.g.. if the threshold = 0, all candidate names will be 
returned in an ordered list); 

• 3 = an ordered list of the top X candidate names whose name score exceeds a 
pre-defined name threshold, and where X is a number; 

• establishing a size limit for the Results list, which in effect defines the value of X for 
producing the top X candidate names; 

• defining a SurnameThreshold; 

• defining a GivenNameThreshold; and 

• defining a NameThreshold. 

If the NameThreshold is set to 0. the tool shall return all candidate names in a Results List, unless a 
Results List size limit is established. 

7.1.1.2 Design Notes 



IAS ® 1998 Language Analysis Systems. Inc. 
Proprietary and Confidential 



Page 61 



Version 1.0- Revised DRAFT 
, AQ M^m,. Com r-'-" T""'^ Functional Design 

in effect specifying its size, i.e., the value of X. 
7.1.2 Produce Results 

in the Result List. 

^ pr«^» »» ^"'■'-"^ 

NameScore. 

„«c.n..a.eN,™Sc.,.,... ,.„,M,o.s« »*r,^e s,.e sc».cn*«.«.. 

according to the following rules: 

. first order the car,didate r,a.es in descending order by each candidate's SurnameScore. 
. if two candidate's SurnameScores are the same, then do the following: 
. if SurnameMode = "average" or "lowest", then do the following: 

. first order the candidate names in descending order by each candidate's 
GivenNameScore, 

. iftwocandidate'sGivenNameScoresarethesame.thendothefollowing: 
. If GivenNameMode = "average" or "lowest", then do the following: 

, first order the candidate names in ascending order l>y the 

S?ence in the number of SN segments between the candidate 
name and the query name. 

. if the difference in the number of SN segments between the 
candidate name and the query name .s the same, then order the 
caSdate names in ascending order by the difference in the 
Sumber of GN segments between the candidate name and the 
query name. 

. If GivenNameMode = "highest", then do the following: 

. first order the candidate names in descending order by each 
candidate's next highest GivenNameSegmentScore. 

. if two candidate's next highest GivenNameSegmentScores are 

st?all the same, then continue to e^^'f ^^l^f ."l^^^^J^^^^^^ " 
GivenNameSegmentScores (up to n. where n is the greater 
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number of Given Name Segments in the two evaluation Given 
Names), and order the candidate names in descending order by 
each candidate's next highest GivenNameSegmentScore. If one 
of the evaluation Given Names has fewer segments than the 
other evaluation Given Name, its missing 
GivenNameSegmentScores will be set to .50 In order to evaluate 
them, 

• if two candidate's GivenNameSegmentScores are all the 
same, then order the candidate names in ascending 
order by the difference in the number of SN segments 
between the candidate name and the query name. 

• if the difference In the number of SN segments between 
the candidate name and the query name is the same, 
then order the candidate names In ascending order by 
the difference in the number of GN segments between 
the candidate name and the query name. 

• If SurnameMode = "highest", then do the following: 

• first order the candidate names in descending order by each candidate's next 

• highest SurnameSegmentScore. . 

• if two candidate's next highest SurnameSegmentScores are still all the same, then 

• continue to evaluate the next highest SurnameSegmentScores (up to n, where n is 
the greater number of Surname Segments in the two evaluation surnames), and 
order the candidate names in descending order by each candidate's next highest 
SurnameSegmentScore. If one of the evaluation surnames has fewer segments 
than the other evaluation surname, its missing SurnameSegmentScores will be set 
to .60 in order to evaluate them. 

• if two candidate's next highest SurnameSegmentScores ^re all the same, 
then do the following: 

• first order the candidate names in descending order by each 
candidate's GivenNameScore. 

• if two candidate's GivenNameScores are the same, then do the 
following: 

• if GivenNameMode = "average" or "lowest", then do the 
following: 
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• first order the candidate names in ascending order by the 
difference in the number of SN segments between the 
candidate name and the query name. 

• if the difference in the number of SN segments between 
the candidate name and the query name is the same, 
then order the candidate names in ascending order by 

• the difference in the number of GN segments between 
-'-V- . the candidate name and the query name. 

• if GivenNameMode = "highest", then do the following: 

• first order the candidate names in descending order by 
each candidate's next highest 
GivenNameSegmentScore. 

• if two candidate's next highest 
GivenNameSegmentScores are still all the same, then 
continue to evaluate the next highest 
GivenNameSegmentScores (up to n, where n is the 
greater number of Given Name Segments in the two 
evaluation Given Names), and order the candidate 
names in descending order by each candidate's next 
highest GivenNameSegmentScore. If one of the 
evaluation Given Names has fewer segments than the 
other evaluation Given Name, its missing 
GivenNameSegmentScores will be set to .50 in order to 
evaluate them. 

• if two candidate's GivenNameSegmentScores are 
all the same, then order the candidate names in 
ascending order by the difference in the number 
of SN segments between the candidate name- 
and the query name. 

• if the difference in the number of SN segments 
between the candidate name and the query name 
is the same, then order the candidate names in 
ascending order by the difference in the number 
of GN segments between the candidate name 
and the query name. 

If the developer specified the top X option, then the tool shall produce a Result List that contains the 
X candidate names that are determined after sorting, to be the most likely matches. 
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The tool shall provide the developer the capability to establish custom results ordering methods, if 
desired. 

7.1.3 Retrieve Results 

The tool shall provide the developer the capability to retrieve matched candidate names from the 
Results List and retrieve additional infomiation about the candidate names to include at a minimum, 

" the Given Name field, the Surname field, the GivenNameScore. the-SumameScore. and other data 

'^that the developer may have defined. 

8. EVALUATION FACTORS and PARAMETERS 

8.1 SurnameChecklnitiaf (previously known as ISSNINITL) 

The SurnameChecklnitial indicates whether a single character in the surname segment will be treated 
as an initial. When SurnameChecklnitial = "T", single characters in the query SN segment are treated 
as initials and. if they match on a candidate SN segment, the tool will usually set the 
SurnameSegmentScore = SurnamelnitialScore (except when there is an exact match on initials, in 
which case the SumameSegmentScore = {1+SurnamelnitialScore)/2 : Exact matches on initials are 
not considered "exact matches" because the Initial may represent two different name segments). 

Initials are relatively uncommon in the surname field. In some cases, such as Chinese names, single 
characters are common in the surname field, in which case one will not want a single letter to be 
treated as an initial, but rather, treated as a name. If treated as a name, a single character will be 
analyzed using digraph matching, and will generally be given very little value, unless it matches on an 
identical one character name. In such cases. SurnameChecklnitial should be set to "F". 

Sum'anneChecklnitial 

Possible Settings: {T,F} 
Default: {F} 



8.2 SurnameCheckVariant (previously known as CHKVARIANT) 

In many cases there are variant spellings for a surname that do not share many common digraphs. 
One possible solution to this problem is the use of the SurnameCheckVariant parameter. When 
SurnameCheckVariant = "T, a table containing Surname Variants is referenced during the evaluation 
as well. 

SurnameCheckVariant 

Possible Settings: {T, F} 
Default: {F} 

8.3 SumameCheckBias (previously known as LDIBIAS) 
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The SumameCheckBias was designed to aid in the analysis of Russian names and other naming 
systems which make use of complex morphological endings. If SumameCheckBias = "T". the tool 
places more emphasis upon the beginning digraphs of a particular name and de-emphasizes later 
digraphs. This is done to eliminate the effect of the morphological endings common to Russian 
names. Because the names are generally long, and many of the names end with the same endings 
(such as -ovich), candidates that were not very good were being easily returned because there were 
.many digraph matches. When SumameCheckBias = 'T\ and when calculating a 
SurnameSegmentScore. the first digraph in the both the query and candidate SN segments is 
assigned a weight factor of 1.00. the second digraph is assigned a weight factor of.9, and so forth 
until .1 is reached, at which point, the remainder of the digraphs are assigned a weight factor of .1. 

If GivenNameCheckBias = "T". the tool shall apply bias by: 

• First, assigning the first contributing digraph in each GN segment a weight factor of 1.00. 
the second contributing digraph a weight factor of .9. and so on until the tenth contributing 
digraph is reached, at which point, all remaining digraphs shall be assigned a weight factor 
of.1. 

• then determine which digraphs match between the two GN segments; 

• sum the weight factors assigned to the matched query GN digraphs with the weight factors 
assigned to the matched candidate GN digraphs; and 

• divide by the sum of all contributing digraphs for both the query GN and the candidate GN. 



GivenNameCheckBias : T 

Query : Moskyovich, FNU 
Candidate : Markovich, FNU 

-M Mo OS sk ky yo ov vi ic ch h- 
(1.0) (.9) {.8)(.7)(.6) (.5) (.4)(.3)(.2) (.1)(.1) 

-M Ma ar rk ko ov vi ic ch h- 
(1.0) (.9) (.8)(.7)(,6) (.5)(.4)(.3)(.2) (.1) 



The following digraphs match: 



-M ov vi ic ch h- 
Query Weight Factors: (10) (.4) (.3) (.2) (.1) (.1) 
Candidate Weight Factors: (1.0) (.5) (.4) (.3) (.2) (.1) 



IAS © 1998 Language Analysis Systems. Inc. 
Proprietary and Confidential 



Page 66 



Version 10 - Revised DRAFT 
LAS Name Comparison Tools Functional Design 



January 23. 1998 



Matched Digraphs = f1■0-^■4'f.3'^.2^^■1-^■1H^■0+ 54-.44-.3-H.2-t-.1) 

Total possible Digraphs (1 .0+.9+.8+.7+.6+.5+.4+.3+.2+. 1 +. 1 .0+.9+.8+.7+.6+.5+.4+.3+.2+. 1 ) 

Therefore, the GivenNameSegmentScore == (4.6 / 11.1) = 0.41 

There are some problems that may occur when SurnameCheckBlas = 'T" For example, 
SurnameGheckBias was set to 'T' during the LQA for Poland because of the high frequency 
morphological endings. However, as a result, a common surname root such as "Kowal" returned 
names such as "Kowalczyk," "Kowalewska," "Kowalewski," "Kowalska, "Kowalik." "Kowalow," 
"Kowal," and "Kowalkowska."(Memo# L94289) Therefore, names which were often not good hits 
were returned because of the additional weight placed upon the beginning digraphs in a name. 
Please see memos L94289 and L94290 for further explanation of the possible problems associated 
with SurnameCheckBias. 

When the SurnameCheckBias = "T". more emphasis is placed upon the beginning digraphs of a . 
name. When the SurnameCheckBias = "F", equal value is placed upon all matching digraphs. 

SurnameCheckBias 

Possible Settings: {T,F} 
Default: {F} 

8.4 SumameCheckUnknownNotExist, LastNameUnknownScore, NoLastNameScore 

Surname Check Non-existent value - used to assign a higher score to a SN segment if there is no 
value in one comparand SN segment, yet there is a value In the other comparand. 



Query : Malcolm LNU 
Candidate : Malcolm Shabaz 



If SumameCheckUnknownNotExist = "F", the SurnameSegmentScore for XNU" compared to 
"Shabaz" would be 0. With SumameCheckUnknownNotExist = "T", the SurnameSegmentScore for 
"LNU" compared to "Shabaz" will be set to the LastNameUnknownScore. The 
LastNameUnknownScore will be set fairly high to accommodate for missing values when it is unclear 
whether there should or should not be a value. This would result in a higher SurnameSegmentScore 
which means that the candidate will be more likely to appear in the TOP X results as well as exceed 
the NameThreshold if it is set above 0. 



SumameCheckUnknownNotExist : T 
Query : Malcolm NLN 
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Candidate ; Malcolm Shabaz 



In the second example above, the SurnameSegmentScore for "NLN" compared to "Shabaz" will be 
set to the NoLastNameScore. The NoLastNameScore will typically be very low to accommodate for 
the fact that there is no last name defined in the query but a last name appears In the candidate. 
. Thus, the candidate Is not a likely match. 

Surn9meCheckUnknownNotExist 
Possible Settings: {T, F) 
. Default: {F) 

NoLastNameScore 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.80} 

LastNameUnknownScore 

Possible Settings: {0.0, 0.1, ...1.0} 
. Default: {.85} 

8-5 SumameCheckCompressed, SurnameCompressedScore 

SurnameCheckCompressed 

Possible Settings: {T,F} 
Default: {F} 

SurnameCompressedScore 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.9} 

8. 6 SurnameAnchorSegment, SurnameAnchorFactor (previously known as 
ANCHSEG, ANCHVAL) 

In order to determine the relative position in both comparands, the tool shaH establish an "index" of a" 
segment based on the SurnameAnchorSegment. For SumameAnchorSegment = "none" or "first", the 
tool shall count segments from left to right. For SumameAnchorSegment = "last", the tool shall count 
segments from right to left. 

Either SurnameOutOfPositionFactor or SurnameAnchorFactor. but not both, can be applied to the 
SurnameSegmentScore.. Thus, if SurnameOutOfPositionFactor has already been applied, then the 
SurnameAnchorFactor can not be applied, even If the SurnameAnchorSegment = "first" or "last". If - 
SurnameOutOfPositionFactor does not apply, then the SurnameAnchorFactor may apply if 
SurnameMode is "highest" or "lowest". If SurnameMode is "average", the SurnameAnchorFactor is 
not applied. 
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The SurnameAnchorSegment Is used in compound surnames to emphasize the importance of one 
segment of the name over another. For example, when dealing with Portuguese names, it is the 
second (i.e., last) surname which is more important, whereas when dealing with other Hispanic 
names it is generally the first surname which is of primary Importance. 

When the SurnameAnchorSegment = "none", neither segment in the name is given more weight O.e., 
.^.basically the SurnameAnchorSegment is turned off). When the SurnameAnchorSegment = "first", 
f.both comparand segments are left-aligned and the left-most or first segment in the name is given 
more weight.,. When the SurnameAnchorSegment = "last", both comparand segments are right- 
aligned and the last segment in the name is given more weight. Thus, if only two surname segments 
exist, then the right-most or last segment is given more weight if SurnameAnchorSegment = "last". 

The way in which the SurnameAnchorSegment gives more weight to certain name segments is 
through the use of the SurnameAnchorFactor. For example, if the SurnameAnchorSegment = 
"first", but neither of the similar digraph names are in the first position, then the 
SurnameSegmentScore is multiplied by the SurnameAnchorFactor. 



SurnameAnchorSegment : first 
SurnameOutOfPositionFactor : .65 
SurnameMode : highest 

SurnameAnchorFactor : .70 

Query : Lopez Garcia, Luis 
Candidate : Santos Garcia. Luis 



In this example, since the SurnameAnchorSegment = first, the two comparands are left-aligned. After 
left-aligning the comparands, more weight should be given to the name in the first position. The 
name "Garcia" matches; however it is in the last position in both the query and the candidate (i.e., it is 
not in the first position, which is the SurnameAnchorSegment). Therefore, the 
SurnameSegmentScore(I.OO) is multiplied by the SurnameAnchorFactor(.70), thus yielding a score of 
: SurnameSegmentScore * SurnameAnchorFactor = 1.00 * .70 = .70. Because the SurnameMode = 
"highest", the SumameScore = the highest SurnameSegmentScore (1.0). therefore, the way in 
which one surname is given more weight is actually to devalue the other or give it less weight. 

If the same parameter settings are In effect, and the segments in the first position in both comparands • 
are being compared, then the SurnameSegmentScore is not devalued. For example: 



SurnameAnchorSegment : first 
SurnameOutOfPositionFactor : .60 
SurnameMode : highest 

SurnameAnchorFactor : .70 
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Query^-.Gonzalez Garcia, Mario 
Candidate : Gonzalez Salvador. Mario 



in this case, after left-alignment, the name "Gonzalez" is considered to be in the first position in both 
- comparands, and since the SurnameAnchorSegment ="firsr, it receives a SurnameSegmentScore of 
1.00. 

Similariy, if the SurnameAnchorSegment = "last", the emphasis Is upon the last element in the 
Surname. In the following example, after right-alignment, the names "Lopez" and "Santos" do not 
share any common digraphs, and therefore receive a SurnameSegmentScore of 0. However, the 
name "Garcia" matches and is In the second position in comparands. therefore it receives a score of 
1.00 and is not multiplied by the SurnameAnchorFactor since the SurnameAnchorSegment = "last". 
Since the SurnameMode = "highest", the SurnameScore = 1.00. which is the highest 
SurnameSegmentScore. 



SurnameAnchorSegment : last 
SurnameOutOfPosilionFactor : .65 
SurnameMode : highest 
SurnameAnchorFactor : .70 

Query : Lopez Garcia, Luis 
Candidate : Santos Garcia, Luis 



In contrast, if the evaluated segments, (i.e.. Lopez and Santos In the example above) are not in the 
last position, the SurnameSegmentScore will be multiplied by the SurnameAnchorFactor. 



SurnameAnchorSegment : last 

Query : Gomez Hernandez, Mario 
Candidate : Gomez Lopes. Mario 



Although the name "Gomez" is an exact match, it is not in the last position in either comparand, and 
the SurnameAnchorSegment = "last". Therefore, "Gomez" is multiplied by the 
SurnameAnchorFactor. The SurnameSegmentScore(I.OO) is multiplied by the 
SurnameAnchorFactor (.70) producing the SurnameSegmentScore = (.70). 

The value of the SurnameAnchorSegment detemiines which of the name segments, if any, is to 
receive the most weight. Raising the SurnameAnchorFactor will actually give more value to the 
segment that is not the SurnameAnchorSegment, while lowering the SurnameAnchorFactor will lower 
the value of the segment that is not the SurnameAnchorSegment. 
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SurnameAnchorSegment 

Possible Settings: {first, last, none} 
Average Range: {first, last, none} 
Default: {none} 

SurnameAnchorFactor 

Possible Settings: {0.00, 0.01.... 1.00} 
Average Settings: {.50...70} 
^^^v.-Default: {.70} 



8.7 SurnameCheckTAQ 

\Nhen the SurnameCheclcTAQ = "off, no TAQ processing will take place at all. 

When the SurnameCheckTAQ = "remove", then TAQ(s) will simply be removed from the name data. 

When the SurnameCheckTAQ = "score", TAQ(s) will be Identified, removed, and associated with 
each relevant name segment during preprocessing, and then the SurnameTAQDeleteFactor. 
SurnameTAQDeleteAbsentFactor. SurnameTAQDisregardFactor, and 

SurnameTAQDisregardAbsentFactor. will be multiplied against the SurnameSegmentScore, which 
will in effect reduce the value of the SumameSegmentScore. 

SurnameCheckTAQ 

Possible Settings: {off, remove, score} 
Default: {score} 

8.8 SurnameMode (previously known as SNMODE) 

The SurnameMode can be set to "highest", "average", or lowest" depending upon how flexible or 
stringent one wants the parameters to be. "Highest" is the most flexible mode setting and "lowesf is 
the most stringent setting. SurnameMode only has an effect if there is more than one name in the 
surname. If there is more than one name in the surname, and the SurnameMode = "highest", then 
the SurnameScore will be set to the highest SumameSegmentScore. 



SurnameMode : highest 

Query : Lopez Garcia, Maria 
Candidate: Lopez Gonzalez, Maria 



In this example, the highest SumameSegmentScore will be used to evaluate the surname :"Lopez 
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Gonzalez". "Lopez" in the candidate matches "Lopez" in the query exactly, thus receiving a score of 
1 .00. Since "Gonzalez" has very few digraph matches with "Garcia", the SurnameScore will be set to 
1 .00. which Is the highest SumameSegmentScore. 



If the SurnameMode = "average", the SumameScore will be set to the average of the 
SumameSegmentScores. 





SurnameMode : average 




Query : Lopez Garcia, Maria 




Candidate: Lopez Gonzalez, Mana 



The candidate segment "Lopez" matches the query segment "Lopez" with a score of 1..00. However, 
there Is only one digraph match between "Garcia" and "Gonzalez", yielding a score of 1/9 or .11. 
Since SurnameMode = "average", the SurnameScore will be set to the average of the two 
SumameSegmentScores. The average of the two SumameSegmentScores in this example is 

(1+.11)/2 = .56. 

If the SurnameMode = "lowest", and if there is more than one name in the surname, then the 
SurnameScore will be set to the lowest SumameSegmentScore. In the example illustrated above, 
the SurnameScore will be set to .11. Clearly, SurnameMode = "lowest" is the most stringent setting. 

If a threshold Is defined to identify "hits", then setting the SurnameMode to "highest" will result In the 
return of more hits. Raising the SurnameMode to "average" will decrease the number of hits retumed 
since the average score of the surnames must also pass the threshold. Raising the SurnameMode to 
"lowest" will further decrease the number of hits returned since both names must pass the threshold. 

SurnameMode: 

Possible Settings: {highest, average, lowest} 
Average Range: {highest, average, lowest} 
Default: {average} 

8.9 SurnameExactlnitialMatchScore 

If SurnameChecklnitial is set to True, then the SurnameExactlnitialMatchScore is used to indicate 
whether two single characters that match one another should be considered "exact matches", and 
therefore be assigned a score of 1 .0. In some cases, it may be desirable to not consider two single 
characters as an exact match since it is possible that the two characters may represent two different 
names. In these cases, one might want to set the SurnameExactlnitialMatchScore = (1- 
SurnamelnitialScore)/2. 

SurnameExactlnitialMatchScore 

Possible Settings: {0.00. 0.1. ... 1.00}' 
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Average Settings: {1.0} ^ . 

Default: {1.0} 

8.10 SurnamelnitialScore 

The SurnamelnitialScore behaves in the same manner as the GivenNamelnitialScore but it applies to 
.^surnames rather than given names. This parameter will be useful in dealing with Hispanic surnames 
, which frequently use an initial to represent High Frequency second surnames. 

SurnamelnitialScore 

Possible Settings: {0.00, 0.1, ... 1.00} 
Average Settings: {.60....90} 
Default: {.85} 

8.11 SNV'SCORE 

The SNV-SCORE is the value given to a pair of Surname variants found in the SURNAME-VARIANT 
Table. The SNV-SCORE is generally set very high, usually at .95. 

SNV-SCORE 

Possible Settings: {0.00, 0.01 .... 1 .00} 
Default: {defined by variant pair} 

8.12 SurnameOutOfPositionFactor, SumameAnchorSegment (previously krtown as 
SNOOPS, ANCHSEG) 

In order to determine the relative position of name segments in both the query and candidate, the tool 
shall establish an "index" of a segment based on the SumameAnchorSegment. For 
SumameAnchorSegment = "none" or "first", the tool shall left-align the name segments. For 
SumameAnchorSegment = "last", the tool shall right-align the name segments. 

The SurnameOutOfPositionFactor factor only applies to name segments that are out of position (i.e.. 
not in the same relative position). When a surname segment is out of position, the • 
SurnameSegmentScore is multiplied by the SurnameOutOfPositionFactor factor. In the following 
example, after left-aligning, the candidate name segments "Garcia" and "Gonzalez" are both 
considered to be out of position. 



SurnameMode : average 
SumameOutOfPositionFactor : .65 
SumameAnchorSegment : first 

Query : Gacria Gonzalez, Mario 
Candidate : Gonzalez Garcia. Mario 
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The name "Gonzalez" is an exact match but It is out of position. Therefore, it receives a value of .( 
(SurnameOutOfPositionFactor) X 1.00 (SurnameSegmentScore) = .65. 

The name "Garcia" is not an exact match. Of the fourteen digraphs, there are 4 digraph matches. 



The total possible digraphs are: 

#G Ga ar rc cl la a# 
#G Ga ac cr ri ia a# 

The matched digraphs are: 

#G Ga ia a# 
#G Ga ia a# 



Therefore, the name receives a score of .65 (SurnameOutOfPositionFactor) X .57 
(SurnameSegmentScore = 8/14 = .57). 

The SurnameMode in this example = "average". Therefore, the SurnameScore = the average of the 
two SurnameSegmentScores. which is.51 = ((.65+.51)/2). 

8.12.1 SurnameOutOfPositionFactor With Surnames Containing Only 1 Name 
Segment 

In the following example, the name "Sanchez" is considered to be out of position. 



SurnameAnchorSegment : first 

Query : Ramirez Sanchez, Luis 
Candidate : Sanchez, Luis 



In the query "Sanchez" is considered to be in the last position, whereas, in the candidate, "Sanchez' 
is considered to be in the first position. Therefore, the SurnameScore = 
SurnameOutOfPositionFactor multiplied by the SurnameSegmentScore (1.00). 

If the SurnameAnchorSegment = "last", then Sanchez would be considered to be in position, and the 
SurnameOutOfPositionFactor would not be applied. 

If a NameThreshold is defined, raising the SurnameOutOfPositionFactor will generally result In the 
return of more names and lowering the SurnameOutOfPositionFactor will make It more difficult for 
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names to pass the threshold. 

SurnameOutOfPositlonFactor: 

Possible Settings: {0.00, 0.01,... 1.00} 
Average Range: {.50...70} 
• Default: {.60} 

.8.13 SumameTAQDisregardAbsentFactor 

absent Surname Disregard TAQ score - refer to section on TAQ scoring in main document for 
description. 

SumameTAQDisregardAbsentFactor 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.80} 

8.14 SurnameTAQDeleteAbsentFactor 

absent Surname Delete TAQ score - refer to section on TAQ scoring in main document for 
description. 

SurnameTAQDeleteAbsentFactor 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.90} 

8.15 SurnameTAQDeleteFactor 

delete Surname TAQ score - refer to section on TAQ scoring in main document for description. 

SurnameTAQDeleteFactor 

Possible Settings: {0.0, 0.1. ...1.0} 
Default: {.85} 

8.16 SumameTAQDisregardFactor 

disregard Surname TAQ score - refer to section on TAQ scoring in main document for description. 

SumameTAQDisregardFactor 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.7} 

8.17 LastNameUnknownScore 

If one of the comparands has been identified as having "last name unknown", then the segment score 
assigned when comparing that comparand with another is the LastNameUnknownScore. 
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LastNameUnknownScore 

Possible Settings: {O.OrO.I, ...1.0} 
Default: {.6} 

8.18 NoLastNameScore 

If one of the comparands has been identified as having "no last name", then the segment score 
'assigned when comparing that comparand with another is the NoLastNameScore. 

NoLastNameScore 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.65} 

8.19 SumameCompressedScore 

In some instances, TAQ values become conjoined with stems in unpredictable ways. In some 
instances, two surname comparands are exact matches except for spacing (e.g., "de la Garcia" and 
"delaGarcia"). If this is determined to be the case, the tool will assign the SumameCompressedScore 
to the SurnameScore, 

SumameCompressedScore 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.9} 



8.20 SurnameTbreshold (previously known as SNTHRESH) 

The SurnameTbreshold is the threshold which the SurnameScore must exceed in order for the 
candidate name to be included in the Results list. If a developer wants to define a threshold rather 
than return the TOP X names, then this parameter may be set to some value other than 0. Setting 
the SurnameTbreshold to 0 essentially turns off the SurnameTbreshold. As the SurnameTbreshold is 
raised, fewer candidate names will be returned as it will be more difficult for a candidate name to pass 
the higher SurnameTbreshold. Conversely, as the SurnameTbreshold is lowered, more candidate 
names will be retumed as it will be easier for a candidate name to pass the lower SurnameTbreshold. 

SurnameTbreshold 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.50} 

8.21 SumameWeight 

The SurnameWeight is the factor (weight) that can be applied to the SurnameScore when 
determining whether a candidate name is to be included In the Results list. This weight factor 
enables one to assign more or less emphasis to a potential candidate based on the SurnameScore. 
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The higher the SurnameWeight. the greater the value of the SumameScore contribution to the overall 
NameScore. If the SurnameWeight is set to 0, the SurnameScore will not contribute any value to the 
overall NameScore. In Version 1, the exception to this occurs if the GivenNameWeight is also set to 
0, In which case, the weight factors cancel one another out. fn Version 1 , we multiply the 
SurnameScore by the SurnameWeight as part of the default overall NameScore calculation. Note 
that developers may or may not choose to apply the SurnameWeight when calculating an overall 
NameScore If they create a different scoring algorithm. 

SurnameWeight 

Possible Settings: {0.0, 0.1, ...1.0} 
-iw- Default: {1.0} 

8.22 GivenNameChecklnitial (previously known as ISGNINITL) 

The GivenNameChecklnitial behaves the same as SurnameChecklnitlal, but applies to given names 
rather than surnames. 

GivenNameChecklnitial 

Possible Settings: {T, F} 
Default: {1} 

8.23 GivenNameCheckVariant (previously known as CHKVARIANT) 

The GivenNameCheckVariant behaves in the same manner as the SurnameCheckVariant, but 
applies to given names rather than surnames. When SurnameCheckVariant = "T". a table containing 
GN Variants is referenced during the evaluation as well. 

GivenNameCheckVariant 

Possible Settings: {T, F} 
Default: {T} 

8.24 GivenNameCheckBias 

This parameter behaves in the same manner as the SurnameCheckBias but it applies to given names 
rather than surnames. 

GivenNameCheckBias 

Possible Settings: {T.F} 
Default: {F} 

8.25 GivenNameCheckUnknownNotExist, NoFirstNameScore, 
FirstNameUnknownScore 

GivenNameCheckUnknownNotExIst is similar to SumameCheckUnknownNotExist except that it 
applies to the GN field. The parameters for GivenNameCheckUnknownNotExist are also specific to - 
the GN field. 
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GivenNameCheckUnknownNotExist 
Possible Settings: {T, F} 
Default: {F} 

NoFirstNameScore 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.80} 

FirstNameUnknownScore 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.85} 

8.26 GivenNameCheckCompressed, GivenNameCompressedScore 

GivenNameCheckCompressed 
Possible Settings: {T,F} 
Default: {F} 

GivenNameCompressedScore 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.9} 

8.27 GivenNameAnchorSegment, GivenNameAnchorFactor 

The GivenNameAnchorSegment behaves in the same manner as the SumameAnchorSegment but it 
applies to given names rather than surnames. 

The GivenNameAnchorFactor behaves in the same manner as the SurnameAnchorFactor but it 
applies to given names rather than surnames. 

GivenNameAnchorSegment 

Possible Settings: (first, last, none} 
Average Range: (first, last, none} 
Default: {none} 

GivenNameAnchorFactor 

Possible Settings: (0.00, 0.01,... 1.00} 
. Average Settings: {.50...70} 
Default: {.70} 

8.28 GivenNameCheckTAQ 

When the GivenNameCheckTAQ = "ofT, no TAQ processing will take place at all. 
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When the GivenNameCheckTAQ = "remove", then TAQ(s) will simply be removed from the name 

data. 

When the GivenNameCheckTAQ = "score". TAQ(s) will be identified, removed, and associated with 
each relevant name segment during preprocessing, and then the GivenNameTAQDeleteFactor. 
GivenNameTAQDeleteAbsentPactor, GivenNameTAQDisregardFactor, and 
GivenNameTAQDisregardAbsentFactor. will be multiplied against the GivenNameSegmentScore, 
which will in effect reduce the value of the GivenNameSegmentScore. 

7 GivenNameCheckTAQ 

Possible Settings: {off, remove, score} 
Default: {score} 

8.29 GivenNameMode 

GivenNameMode operates exactly the same way as the SurnameMode but It applies to given names 
rather than surnames. 

GivenNameMode: 

Possible Settings: {highest, average, lowest} 
Average Range: {highest, average, lowest} 
Default: {average} 

8.30 GivenNameExactlnittalMatchScore 

If GivenNameChecklnitial is set to True, then the GivenNameExacllnitialMalchScore is used to 
indicate whether two single characters that match one another should be considered "exact matches", 
and therefore be assigned a score of 1.0. In some cases. It may be desirable to not consider two 
single characters as an exact match since it is possible that the two characters may represent two 
different names. In these cases, one might want to set the GivenNameExactlnltialMatchScore = (1- 
GlvenNamelnltialScore)/2. 

GivenNameExactlnitialMatchScore 

Possible Settings: {0.00, 0.1, ... 1.00} 
Average Settings: {1.0} 
Default: {1.0} 

8.31 GivenNamelnitialScore 

The GivenNamelnitialScore deals with the treatment of initials during a name check. In the following 
example, the initial "M" in the candidate could correspond to the name "Mohamed" in the query. 
Instead of considering it as a single digraph match, which in this case, would yield a score of .125. 
the "M" is given the value of the GivenNamelnitialScore. 

I -n ■ 
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GIvenNamelnitialScore : .85 
Query : Ali, Mohamed 
Candidate : Ali, M 



If a NameThreshold is defined, raising the GIvenNamelnitialScore will result in the return of more 
good hits since the value of an initial In a potential hit has been raised. Likewise, lowering the 
^GIvenNamelnitialScore will result in a decrease in the number of hits returned. 

GiveHNamelnltialScore 

Possible Settings: {0.00, 0.01, ... 1.00} 
Average Settings: {.60....96} 
Default: {.85} 

8.32 GNV'SCORE 

The GNV-SCORE is the value given to a pair of Given Name variants found in the GIVEN-NAME- 
VARIANT Table. The GNV-SCORE is generally set very high, usually at ,95. 

GNV-SCORE 

' PossibleSettings:{0.00, 0.01,... 1.00} 
Default: {defined by variant pair} 

8.33 GivenNameOutOfPositionFactor (previously known as GNOOPS) 

The GivenNameOutOfPositionFactor factor operates in the same manner as the 
SurnameOutOfPositionFactor. 

GivenNameOutOfPositionFactor: 

Possible Settings: {0.00, 0.01,... 1.00} 
Average Range: {.50...70} 
Default: {.55} 

8.34 GivenNameTAQDisregardAbsentFactor 

absent GN Disregard TAQ score - refer to section on TAQ scoring in main document for description. 

GivenNameTAQDisregardAbsentFactor 
Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.80} 

8.35 GivenNameTAQDeleteAbsentFactor 

absent GN Delete TAG score - refer to section on TAQ scoring in main document for description. 
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GivenNameTAQDeleteAbsentFactor 

Possible Settings: {0.0, 0.1. ...1.0} 
Default: {.90} 

8.36 GivenNameTAQDeleteFactor 

delete GN TAQ score - refer to section on TAQ scoring in main document for description. 

GivenNameTAQDeleteFactor 

Possible Settings: {0.0. 0.1. ...1.0} 
Default: {.85} 

8.37 GivenNameTAQDisregardFactor 

disregard GN TAQ score - refer to section on TAQ scoring in main document for description. 

GivenNameTAQDisregardFactor 

Possible Settings: {0.0, 0.1. ...1.0} 
Default: {.7} 



8.38 FirstNameUnknownScore 

If one of the comparands has been identified as having "first name unknown", then the segment score 
assigned when comparing that comparand with another is the FirstNameUnknownScore. 

FirstNameUnknownScore 

Possible Settings: {0.0, 0.1. ...1.0} 
Default: {.6} 

8.39 NoFirstNameScore 

If one of the comparands has been identified as having "no first name", then the segment score 
assigned when comparing that comparand with another is the NoFirstNameScore. 

NoFirstNameScore 

Possible Settings: {0.0. 0.1, ...1.0} 
Default: {.65} 

8.40 GivenNameCompressedScore 

In some instances. TAQ values become conjoined with stems in unpredictable ways. In some ^ 
instances, two given name comparands are exact matches except for spacing (e.g., "nur al din" and 
"nuraldin"). If this is determined to be the case, the tpol will assign the GivenNameCompressedScore 
to the GivenNameScore. 
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GivenNameCompressedScore. 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.9} 



, 8,41 GivenNameThreshold (previously known as GNTHRESH) 

The GivenNameThreshold is the threshold which the GivenNameScore must exceed in order for the 
candidate name to be included in the Results list. If a developer wants to define a threshold rather 
than return the TOP X names, then this parameter may be set to some value other than 0. Setting 
the GivenNameThreshotd to 0 essentially turns off the GivenNameThreshold. As the 
GivenNameThreshold is raised, fewer candidate names will be returned as it will be more difficult for 
a candidate name to pass the higher GivenNameThreshold. Conversely, as the 
GivenNameThreshold is lowered, more candidate names will be returned as it will be easier for a 
candidate name to pass the lower GivenNameThreshotd. 

GivenNameThreshold 

Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.50} 

8.42 GivenNameWeight 

The GivenNameWeight Is the factor (weight) that can be applied to the GivenNameScore when 
determining whether a candidate name is to be included in the Results list. This weight factor 
enables one to assign more or less emphasis to a potential candidate based on the 
GivenNameScore. The higher the GivenNameWeight, the greater the value of the GivenNameScore 
contribution to the overall NameScore. If the GivenNameWeight is set to 0, the GivenNameScore will 
not contribute any value to the overall NameScore. In Version 1 , the exception to this occurs if the 
SurnameWeight is also set to 0. in which case, the weight factors cancel one another out. In Version 
1 , we multiply the GivenNameScore by the GivenNameWeight as part of the default overall 
NameScore calculation. Note that developers may or may not choose to apply the 
GivenNameWeight when calculating an overall NameScore if they create a different scoring 
algorithm. 

GivenNameWeight 

• Possible Settings: {0.0, 0.1, ...1.0} 
Default: {.80} 

8.43 NameThreshold 

The NameThreshold is the threshold which the NameScore must exceed in order for the candidate 
name to be included in the Results list. If a developer wants to define a threshold rather than return 
the TOP X names, then this parameter may be set to some value other than 0, Setting the 
NameThreshold to 0 essentially tums off the NameThreshold. As the NameThreshold is raised, 
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fewer candidate names will be returned as it will be more difficult for a candidate name to pass the 
higher NameThreshold. Conversely, as the NameThreshold is lowered, more candidate names will 
be returned as it will be easier for a candidate name to pass the lower NameThreshold. 

NameThreshold 

Possible Settings: {0.0, 0.1, ...1.0} 
• Default: {.60} 
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1. Introduction 



The Social Security Administration (SSA) is seeking vendors who can provide software that 
can be used to build and access a file/data base using (customer) name as the access key. It 
will be used to retrieve information from a file where the names are often 
incomplete/truncated, names with unusual construct, or many times misspelled. It will also be 
used to match transaction records against a master name file/database. The software must 
a/so be able to evaluate (score/rate) the strength of one name string against another, both in 
on-line and batch processing. The software should also be flexible as to the data store in 
which the names are stored, giving SSA flexibility as to the storage vehicle. 

Language Analysis Systems, Inc. (LAS), offers a set of Application Programming Interfaces 
(APIs) that enhance automated solutions to name searching issues by internalizing knowledge 
about cultural variation in names. 

LAS Implements a multifaceted approach to multicultural name searching. For example, in 
the Hispanic culture, an individual typically has a compound family name (e.g., Aranxta 
SANCHEZ VICARIO), the first of which {SANCHEZ) provides the more valuable Identifying 
information. In contrast, although Portuguese names also typically have compound family 
names and look very similar to Hispanic names (e.g., Maria FERREIRA DOS SANTOS), the 
second family name (DOS SANTOS) provides the more valuable identifying information. If a 
single solution were proposed - where, for example, the Last Name is the important name, as 
in American names - Hispanic names would not be adequately accommodated. 

The LAS solution applies whatever resources will adequately address the problem at hand 
whether the variation Is cross-cultural or arises from spelling variation, from transcription from 
other writing systems, from sound similarity, or from missing or additional information. 

Spelling Variations. Spelling variations can usually be addressed via character-matching 
techniques (e.g., LESLEY, LESLIE), However, false positive matches can easily result from 
traditional string or character comparisons when morphological endings such as OVIC. occur 
at the end of a name (e.g. ZELE NOVIC, JOVA NOVIC) . 

Transcription Issues. Transcription variation generates a unique set of issues that result 
from different character sets, dialectal variations, and sounds that are not duplicated in Roman 
script. A single Chinese character (ideogram) can be transcribed to produce numerous 
Roman forms that have little or no resemblance to one another due to dialectal variations. For 
example, few individuals would recognize that CH ANG, J ANG and ZHANG are different 

representations of the exact same Chinese name, BB. 

Sound Similarity. Names are often misheard or misrepresented as a result of pronunciation 
and expected spelling. WOOSTER, WORCHESTER, and WUSTER may or may not be . 
pronounced identically and depending on the pronunciation, an individual hearing the name 
may expect a certain spelling representation. When sharing name data orally, both the 
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pronunciation by the speaker and the expectations of the listener may have significant impact 
on the final representation of a name in a database system or written form. 

Missing or Additional Data, Another common cause of name variation is the inclusion or 
exclusion of name data. Depending on the data source, names may be formal such as 
THOMAS EDWARD WINTHROP III, or informal such as TOM WINTHROP. A name search 
system must be capable of relating these two names to one another regardless whether all or 
some portion of the name is available. Note that missing or additional data may include 
valuable name segments (e.g.. EDWARD in the example above), or less pertinent information 
such as titles, prefixes, suffixes, or qualifiers (e.g., the qualifier /// in the example above). 

A single solution cannot address the range of problems posed by multicultural name 
searching. Neither the sound similarity in names such as SHAWN SMYTHE and SEAN 
SMITH, nor the transcription variation in IMHEMED BUCHLEIBI and MOHAMMED ABU 
SHLAysy can be easily handled by character-matching techniques. The differences are too 
great. Many search systems attempt to address these difficulties with equivalency lists or 
tables. While such lists can accommodate some of the most common variations, they are 
exceptionally limited, especially when it comes to random variation or error. 

Keyed retrieval - using Soundex-like keys, for example - may be able to level some of the 
differences, but most keys are based on variations found in English, and therefore, do not 
accommodate the variation typical of other languages; nor do they accommodate random 
errors. For example, a standard Soundex key on the name DOESCHER would be D226; for 
the similar name DOERSHER, the key would be D626. Because the keys do not match, 
retrieval of these similar names would NOT take place. 

The LAS Suite of Tools supplies the techniques necessary for complete and accurate retrieval 
of person, organization, and place name information. The LAS Suite of Tools is grounded in 
exacting cultural analysis and research, provides a broader and deeper search, and 
accommodates random variation. 

WorldSearch^ (referred to as SNAPI in the enclosed documentation) employs multiple 
evaluation techniques to evaluate and score similar data. This too) determines whether two 
names are similar and assigns a score indicating the probability that the two names are in fact 
variations of one another. The tool incorporates information regarding variations in spelling, 
discrepancy in the amount of information included, exclusion of expected information, and 
positional information in order to establish a name score, which indicates the probability that 
the two names represent the same individual. The tool also orders scored similar data based 
on proximity rules. For example, an exact match should always appear at the top of any 
ordered match list. Other variations are ordered based on variations In spelling, Inclusion of 
additional Information, exclusion of expected Information, and positional information. 

World.Search^" can be used to match transaction records against a master name 
file/database. It can also be used to evaluate (score/rate) the strength of one name string 
against another, both in on-line and batch processing. WorldSearch™ is totally flexible as to 
the data store in which the names are stored, thus providing SSA flexibility in their selection of 
the data store. WorldSearch^ is extremely flexible and extensible; supporting more than 40 
tune-able parameters, and the Inclusion of additional data elements in the scoring mechanism. 
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as well as modification to the actual scoring mechanism itself to accommodate customer- 
specific needs. 

Prime Contact: 

Leslie Minnix-Wolfe - Director of Technical Development 

Language Analysis Systems. Inc, 

2214 Rock Hill Road 

Herndon.VA 20170 
Pfione: . 

(703) 834-6200 x229 

Fax: 

(703) 834-6230 

Email: 

lmw@las-incxom 

Reference Information 

Jerry Cuffee. Office of Research and Development. (703) 613-8758 

Sam Whitmer. US Department of State, (202) 663-1102 
Jim Richardson. (703) 893-0427 

2. Mandatory requirements of the software package must be: 

2.1 Field proven with a proven tracli record, currently commercially available, in use 
in production environments at multiple customer sites, SSA must be able to contact 
existing users; software that is in BETA testing or in development is not acceptable; 

Earlier versions of WorldSearch^" are fielded at over 210 consular sites around the world in 
support of the US Department "of State. The latest, more advanced, version of 
WorldSearch^ is commercially available today. 

2.2 Tunable/flexible in its ability to create data base keys for storing records in the 
creation and updating of the data base. Allow various ways to develop an access key 
to search the data base in order to retrieve data by a client's name. 

The current version of WorldSearch^" does not create data base keys for accessing records 
in a data base. Keyed retrieval is inflexible by definition. For example, existing sound-based 
keyed retrieval methods, such as Soundex and NYSIIS (a derivative of Soundex) are very 
limited solutions. Using these keying techniques, two different names generate the same key. 
and therefore would be retrieved together: 



Name 

SMOOT 

SMITH 



Soundex Key 

S530 

S530 



NYSIIS Key 

SNAT 

SNAT 
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More importantly, the same name spelled two ways has two different codes, and therefore 
would not be retrieved together: 



Name Soundex Key NYSIIS Key 

WUSTER W236 WASTAR 

WORCHESTER W622 WARCASTAR 



Random en-ors and truncations are real problems for these and other keying strategies. 
Existing keyed-retrieval systems address only one aspect of the name search problem, 
thereby eliminating the possibility of returning valid matches. They provide a one-size-fits-all 
approach to a much more complex problem. Their approach also tends to be Anglo-centric, 
which inhibits one's ability to address the issues of multi-cultural names, such as Hispanic, 
Arabic, and American Indian names. Keys in general cannot accommodate random variation 
and extreme spelling variations. WorldSearch™ promotes multiple data base sub-setting 
strategies to wori< around the limitations inherent in keyed retrieval. These strategies 
incorporate other data elements, as appropriate and incorporate additional information about 
the name, such as the cultural/ethnic origin. 

Future versions of VVorldSearch~ will incorporate among other features, sophisticated 
phonetic indexes as well as enhanced pre-processing of data to accommodate extreme 
spelling variations prior to index generation. Culture-specific indexing strategies will be 
incorporated to accommodate different cultural issues as well as the random errors that are 
concealed by sound-based keyed retrieval techniques like Soundex and its derivatives. Note 
that current plans for WorldSearch^ indexes include keyed indexes as well as non-key based 
indexing (e.g., bitmap indexing). An initial offering of a single indexing (keyed or non-keyed) 
strategy should be commercially available in the first quarter of 1998. 

2.3 Allow adjustments to be made to the scoring mechanism. Ideally these changes 
should be done via initialization files, rather than package source code changes 
(which would result in customized variations of the original product). Allow for 
scoring based on the name string and Social Security Number string. 

Flexibility and extensibility are two of the principles upon which WorldSearch^ was 

developed. There are over 40 tune-able parameters provided to enable adjustments in the 
scoring mechanism. In addition, culture-specific packages of parameters are provided with 
the tool to facilitate culture-specific handling of name issues. Applications can be constructed 
to enable the end-user to make adjustments in an interactive mode or can override the default 
parameter settings to accommodate customer-specific requirements in support of either or 
both the interactive mode and a batch mode. For example, a batch process might be 
established to compare two names using a "tight" search, and if no matches are found, a 
subsequent process might be established to then compare the two names using a "loose* 
search. Consider the following: 

Given name Surname ^ 
Query: Gerald David 
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Data record: David Gerald 

A "tight" search might be defined as one that considers the names in their specific surname . 
and given name format, and a "loose" version of that search, might consider inversion of the 
surname and given name. For more details, refer to the Developer's Documentation on the 
SNQueryParms Class. 

Additional data elements may also be integrated into the scoring mechanism to accommodate 
scoring based on name data as well as any other desired data elements, such as Social 
Security Number. For more details, refer to the Developer's.Documentation on the 
SN QbervNamePata Class and SNEvalNameData Class . 



2.4 Allow for transposed numbers and transposed letters. 

In addition to predictable variations in names. WorfdSearch^ easily handles unpredictable 
variations such as transposed letters (e.g., RODRIGUEZ = RODIGRUEZ), as well as other 
random en-ors, such as truncation (e.g., CORNWALL = CORNW) and typos (e.g., BOMEZ = 
GOMEZ). 

2.5 Allow nicknames and derivative names to be scored as equal (e.g. Anthony = 
Tony, Jose = Joseph - Joey = Giuseppe, etc). T he package should have a built in 
store of nicknames, derivatives and it should allow for customization of the nicknames 
and derivatives. Allow equating names such as St s Saint 

In addition to names with predictable similar spelling variations (e.g., GONZALEZ = 
GONZALES), WorldSearch^ provides for very sophisticated handling of: 

• predictable similar sounding, but different spelling variations (e.g., CRUZ = KRUSE . 
= CREWS = CRUISE): 

' . nicknames (e.g.. ANTHONY = TONY); 

• abbreviations (e.g., SAINT = ST.); 

• gender differences (e.g., MARIA = MARIO): 

• morphological endings (e.g.. JOHNS = JOHNSON): and 

• other derivative names. 

WorldSearch™ differentiates between the different types of variations that occur and 
therefore, does not simply score two variations as equal. Rather, it provides a finer level of 
granularity, in determining the degree of similarity between two name variations. As a result, 
customization of these variations is not provided with the current version of the tool, as it 
requires rather extensive knowledge of name searching. Future versions of the tool may allow 
for customization, however. 

WorldSearch™ provides culture-specific sets of these values in order to handle cross-cultural- 
issues. For example, VAN might be considered a nickname for VANESSA or VANYA, but it is 
also considered a prefix in Dutch names like VAN ROSSUM, and a gender marker in 
Vietnamese names like VAN NGUYEN. Therefore, one might not want to consider VANESSA 
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NGUYEN as a match for THANH VAN NGUYEN, depending on the nature of the data. For 
more details, refer to the Developer's Documentation on Variant processing . 



2.6 Allow for "cleansing (Ignore) titles (e.g. Dr., Jr., RN, Sr., Etc.). 

WorldSearch™ provides for very sophisticated handling of titles (e.g., DR., MR.) affixes 
(prefixes (e.g. DE, LA, VAN), suffixes (e.g.. Aldin, Din)), and qualifiers (e.g., JR., SR., RN). 
. WorldSearch^ provides culture-specific sets of these values in order to handle cross-cultural 
issues. For example. BEN is a common prefix in Arabic names like BEN GURION, but it is 
also axommon given name or nickname (e.g.. BENJAMIN = BEN) in Anglo cultures. 
WorldSearch™ does not simply ignore TAQ values, as in some cases, these values provide 
additional information when evaluating a candidate name. For example, if one is searching for 
RICHARD ANTON UHRIG, JR. and finds RICHARD ANTON UHRIG, SR., depending on the 
application, the Jr. and Sr. provide information that is valuable in determining whether these 
two records match or not. WorldSearch^ provides the flexibility to decide how and when to 
apply these more sophisticated scoring techniques. TAQ processing can be turned off entirely, 
or turned on to simply ignore all TAQ values, or to score the TAQ values. For more details, 
refer to the Developer's Documentation on TAQ processino . 

2.7 Able to run on an IBM MVS/ESA compatible mainframe. "Callable" from batch or 
CICS/COBOL. 

WorldSearch^ is composed of one or more C++ APIs and is compatible with any modem 
platform with a C++ compiler. There are several ways of accomplishing this, but one of the 
better approaches is to establish a Name Server which receives search requests from an 
application, processes the request, and then returns the desired results to the calling 
application. This approach provides more flexibility and extensibility to the Name Server to 
support multiple application interfaces such as on-line versus batch. It also eliminates the 
need to have a COBOL application become a COBOUC application, which is clearly more 
complex to develop as well as more difficult to maintain. 

2.8 Work with a multi-segmented data base (containing millions of records per 
segment). The entire data base currently contains in excess of 200 million records. 

WorldSearch"* is entirely independent of the data store, and therefore, can work with a multi- 
segmented data base. Different strategies can be implemented to handle the large volume of 
data. 

2.9 Contain name match profiling/tuning/evaluation software as part of the suite of 
tools. 

WorldSearch^" is essentially a name match pr9filing/tuning/evaluation tool. It provides the 
capability to evaluate and score name data. It also provides complete flexibility to tune the 
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evaluation mechanism and extensibility to incorporate additional information into the 
evaluation algorithm(s). . . 

In addition, consulting sen/ices are available to provide more extensive analysis and profiling 
of the data, and subsequent tuning of the parameters and scoring techniques, as well as the 
incorporation of application-specific requirements. 

2.10 The "support package" must contain full documentation that is currently 
available, pre-existing training programs and courses, customer support for 
immediate qgnsultation on technical problems/issues (NN hours per day, from xx-yy). 
A full copy of the latest version of the WorldSearch "'Developer's Documentation is included 
with this RFI response. This documentation is provided in HTML fomnat and will soon be 
available via the LAS web page. 

A maximum of 40 hours of technical support is included with the base purchase price of the 
product to assist with the initial understanding and use of the API's. f 

With the purchase of an annual maintenance agreement, technical support is provided 24 
hours per day, 7 days a week. Technical support will provide on-going consultation to address 
technical problems/issues with the integration and use of the APIs. 
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WELCOME 

Welcome to the LAS SNAP! Product Support Site. The purpose of this web site is to enhance the 
support services we provide to our customers. We have provided a number of resources here to help you 
resolve problems, report bugs, and suggest improvements to our products and services. 

You may also obtain technical support via either : 

Telephone: (703) 834-6200 

E-mail: snapi@las-inc.com . 

9 Welcome This Page. 

^ General Description of SNAPI From A Developer's Perspective. A good 

9 Overview place to start. 

O What's New News & Announcements. 

9 FAQs Answers To Frequently Asked Questions. 

9 Tutorial Detailed explanation of how to write applications using SNAPI. 

Q API Documentation Full API Documentation. 

9 Sample Code Source Code Illustrating SNAPI's Most Important Features. 

9 Bugs List Of Known Problems. 

9 Suggestions Tell Us The FeaturesYou Would Like To See In Our Next Version. 

9 Download Download Source Code, Sample Applications, Demo Versions. 

9 Search Search the entire SNAPI Site. 



SNAPI is a trademark of Language Analysis Systems. All other products mentioned are registered trademarks or 
trademarks of their respective companies. 

Questions or problems regarding this web site should be directed to webmasterr^las-inc.com. 
Copyright © 1997 Language Analysis Systems. All rights reserved. 
Last modified: Friday November 21, 1997. 



Developer's Overview 



SNAPI Developer Support 

r Welcome 1 Overview | What's New | FAQs I API Documentation | Tutorial 1 Sample Code | Bugs | 
Suggestions I Downlo"ad | Search! 



This section gives a basic and brief overview of the SNAPI system. For a more detailed explanation of 
how to use the API, please see the Tutorial. Once you have read the overview, click here for some 
suggestions on where to ^o fix)m here . 

Developer's Overview 

SNAPI (Smart Name API) is a set of programming libraries (functions and classes) that enables a 
developer to add fuzzy personal name searching to an application. It gives you the capability to perform 
operations such as "Give me the 10 closest names to 'James Slesinger" from my database , or Give me 
all the names from my database that match *John Wong' with a degree of confidence of 0.9 or Tell me 
the degree of similarity between 'Paul Vanesann* and '? Vanlesann'". The system uses a vanety of 
linguistic techniques to achieve solid, dependable results. 

The libraries are coded in C-H-, and can be easily inte^ted into any application written in C++. SNAPI 
is available on any platform that supports a C++ compiler. The SNAPI system was designed wiUi (he 
following goals: simplicity and ease of integration, maximum flexibility, and maximum extensibility. 



Simplicity and Ease of Integration 

From the developer perspective, the SNAPI system is quite simple. A typical name search requires the 
use of just four classes (SNQueryParms, SNQueryNameData, SNEvalNameData, and SNResultsList). In 
addition, the extra code required to integrate SNAPI is minimal. Both the code snippet in the Tutorial, 
and the code samples illustrate this point. 

SNAPI's interface is simplified by the fact that it makes no assumptions about your data and how it is 
stored. The philosophy behind our product is that you know your data better than anyone else. This 
allows for a much cleaner design - You provide the name you are looking for, as well as the names from 
your database. The product tells you which names are likely matches, and qualifies their degree of 
similarity. Behind the scenes, the process is much more complex, but from the perspective of the 
developer, the tool appears straight-forward and easy to integrate. 



Flexibility 

Searches via the SNAPI system are configurable by adjusting any of 43 parameters. Each parameter 
controls some aspect of how two names are evaluated when determining if they are similar. Some of the 
more basic parameters set thresholds for determining how close two names must be in order to be 
considered a match. Other parameters control more complex processing, such as how to handle 
multi-segment names. In general, only a small set of parameters need to be adjusted by the developer, 
because reasonable defaults exist for each one. Documentation for the SNQueryParms class discusses 
each of the parameters. 

SNAPI also provides pre-defined packages of parameters, each tailored to a particular culture or 
ethnicity. For example, Hispanic names have certain characteristics such as compound sumaities (e.g., 
Torres de la Cruz) that can cause problems when searching for Hispanic names using conventional 
methods, which are typically Anglo-centric. The Hispanic parameters package contains settings that 
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address Hispanic-specific name issues. New culturayethnic parameter packages can be established and 
existing packages can be modified as desired. The SNQueryParms constructor descnbes the various 
parameter packages available. 

Extensibility 

Because SNAPI is a C++ object framework, developers can extend the existing functionality to 
incorporate additional data elements in the scoring algorithm or create evaluation methods specific to 
their business or application needs. For example, a database might contain Social Security Number m 
addition to given name and surname. SNAPI only provides for comparisons of name data. However, a 
developer can take advantage of class inheritance (a feamre of C++), and easily subclass SNAPI s 
SNEvalNameData and SNQueryNameData objects to include SSN or any other desired data element(s). 
This data can then be used in the methods that score evaluation names, and determine .which evaluation 
names are matches. In other words, record matching can be performed using name data in conjunction 
with other available data element information. 

Developers can also provide custom methods for determining if an evaluation name matches a query 
name or not. SNAPI's default method compares the average of the given name score and surname score 
to a developer supplied threshold value. However, a more complex method may be desired. For 
example, the business rules of an application might dictate that a name can not be considered a match 
unless either the surname or given name is an exact match. By overriding SNAPI's default method, the 
developer can easily implement this logic in just a few lines of code. 



Where To Go From Here 

Now that you have a basic understanding of what the SNAPI API provides, we recommend proceeding 
to the tutorial There, you will find several "code snippets" that demonstrate how to use the SNAPI 
objects. From there, you can reference the API documenation for a more detailed discussion of the 
classes and methods. Alternatively, you can view the FAQ lists to search for the answer to a particular 
question. 



SNAPI is a trademark of Language Analysis Systems. AH other products mentioned arc registered trademarks or 
trademarks of their respective companies. 

Questions or problems regarding this web site should be directed to webmaster@las-inc.com. 
Copyright (D 1997 Language Analysis Systems. All rights reserved. 
Last modined: Friday November 21, 1997. 
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SNEvalNameData Class 

• Class Overview 

• Subclassing 

• Methods Summary 

• Attnbutes 

• Construction 

• Method Details 



Overview 

A subclass of SNNameData, SNEvalNameData represents a candidate name that will be 
compared against a query name (an SNQueryNameData object). Built on top of 
SNNameData, it adds the data necessary to keep track of scores resulting from a 
comparison. In addition, it adds a method to perform a comparison between itself and an 
SNQueryNameData object. 

The developer may subclass this class to add any data that might be useful to attach to a 
candidate name. This practice becomes important if the developer is using an SNResultsList 
to manage the hits during a session, because they will probably need some unique way 
(within their system) to identify the SNEvalNameData objects that end up m the results list 
after all names have been evaluated. A common example is a subclass that adds a database 
recordid field. Once all candidate name objects have been processed, the results list c^ be 
queried to obtain those objects that are considered matches. Each object that was considered 
a match can then be queried to obtain its database recordid. 

The developer is responsible for deleting any SNEvalNameData created by their code. 
Typically the developer will construct a new SNEvalNameData object, compare it to the 
query name (an SNQueryNameData object), and then delete the SNEvalNameData object 
Before deleting the object, the developer may wish to examine the scores that result from 
the comparison (e.g. getNameScore Q, getSnScore Q, etc.). If an SNResultsList object is 
being used in the query process, the developer can safely delete the SNEvalNameData 
object after the comparison, because the SNResultsList object makes copies of the objects it 
manages. See the SNResultsList documentation for a more detailed discussion. 



Subclassing 

Developers may wish to subclass SNEvalNameData for a variety of reasons. The most 
common need for subclassing is to allow application specific data to be attached to each 
evaluation name. For example, an application might read candidate names from a database, 
where each name consists of a given name, surname, unique record id, and birthdate. 
^ However, SNEvalNameData only knows about given name and sumame, and is oblivious to 

record Id and birth date. By subclassing SNEvalNameData, a developer can add these or 
any other data elements. This method of tagging candidate name objects becomes important 
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when an SNResultsList is used to manage hits. In this case, some method is needed to link 
the objects returned from the SNResultsList back to their associated data m the ongmal data 
source. 

Subclassing can also allow an application to use extra data to affect the search and 
evaluation process. Continuing the example above, suppose a developer wants to mclude 
birth date as a factor in the search. The developer, having subclassed SNEvalNameData, can 
override the calcNameScore Q method to include age differential (how close m age the 
candidate is to the query) m the evaluation method. 

The following are methods that can be overridden to provide specialized processing in a 
isubclass: 



-SNEvalNameDataO 
calcComponentScoresQ 

calcNameScore Q 

compareScore Q 

fietCompResult Q 

resetScoresQ 



Destructor for the class. Ensure that the destructor for your 
subclass frees any resources your subclass allocates. 

Calculates the name field (given name and surname) scores for 
an evaluation name. Override this if you wish to calculate scores 
using application-specific data. 

Determines the composite name score for an evaluation name. 
The composite score incorporates the component scores into a 
single value. Override this if you wish to incorporate 
application-specific data into the name score calculation. 

Compares two scored evaluation names. Override this if you 
wish to change the way evaluation names are sorted within an 
SNResultsList 

Determines if a scored evaluation name is a match or not. 
Override this if you wish to incorporate application-specific data 
into ie "match/no match" decision process. 

Resets scores within the class. Override this if you are 
pre-loading evaluation names and have added additional score 
variables that need to be reset before performing a comparison. 



Methods Summary 



Common Methods: 



SNEvalNameData Q Various constructors for the class. 

getGnScore Q Returns the given name score after a comparison. 

getNameScoreQ Returns the composite name score after a comparison. 

getSnScore O Returns the surname score'after a comparison. 

performComp Q Compares this object to an SNQueryNameData object. 
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Specialized Methods: 



Hl££oingonen^coresO 
calcNameSgnri> o 

compareScorp f) 

getComp ResiilfO 
resetScores H 



called te«ly by iSe^^^^^^^^ "'^rcs for an evaluation name. Not 

Not called directlyl^y Sie dSoper SNQueryNameData object. 
cS^S^aSS^^^^^ 

pre-processed r(^resentations of Se.V H^f 

increased perfomance. ' database into memoiy for 



Attributes: 

double gnScore; 

double snScore; 

double nameScore; 

intgnSegDifferential; 

intsnSegDifierential; 

double gnSegScores[J; 

double snSegScores[|; 



be accessed directly by^ubcTasS:.^VS2,Sa'"' ""^^ 
^la^t^d^lS^^^^^^^ 

be accessed directly by^ubclasS oFsESS'""' """^ 

(«?erasTw?o5^^^^^^^^ ^'-'y n-ne 

protected member, Sdc^ oSblT.^ ' comparison. This is a 
SNEvalNameData. ^ '''' ^'"'""^ '^'^^tly by subclasses of 

^^i^i^n^l'^r^^^^^^ segments in this 

comparison. This is a protected "j"""" ' a 

directlybysubcIasl^SirNS^^^^ 

?ll^?'2r^TsSfQ^:S^Mt^^^^^^ « in this object's 

SNEvalNameData. "'^ ^'^^ctly by subclasses of 

SNEvalNameData: ^ '^'''^tly by subclasses of 
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Method Details; 
Constructors; 

^^^^^^^^^^^^^^^^^^^^^^^ 

The first f r " '^^^'^^ s given 

TTie c..„ ^ c coiTcsponds to SNAPl\1„^ J""^* efficient 

SN^I ""i* a^ommodates system, . u "^"^ '"odel. 
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Si™ ni-T° "^"S I" torn -.„„. 

'hegiven name field If alK' ^« P'^ced in 
sesnent. that segment L i ''^ J"st one 

and the giv^TSeK" ^"^^e 
A Ay values are recn<«,;, i ""'fi'ligent m that 



rather than Zon^LT^^ of "Jones Jr" 
surname. "'^ ^'gning "Jr" as thg ' 

vereions ofSNAPlJ^^- Future 
sopWsticatedSisTif ' 'ecpiporate Se 
automated decSSi 5^,'«^q"es ^ 

appropriate name fields ^'^S into 



i*arameters: 



<lPanns 



Return Values: 
None. 

Wemory Management: 

"JWt. The exception to 
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*candidatel 
*candiclate2 
*candidate3 
*candidate4 
*queryParms 



Sl^EvalNameOata 
SNEvaiNameData 
SNEvalNameData 
SNEvaiNameData 
SNQueryParms 




None. 
Return Values: 

A double value between 0 0 anrf i n • 

the query and candidate naS^SmliJhS'"""^ '^'""'y g,ven natnes of 
Examples: 

ex^^r^aS:"-- -ows a co^pa^.^on an. s^.^eVuen. 

equent given name score 

"q"^"n^%^^L ^^^^ 

double queryParms - new snoi..,...o 
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candidate = new SNEvalNameData (queryParms, "Bob Earl", "Jones"); 
queryName - new SNQueryNameData ( query Parms, "James Earl", "Jones"); 

candidate->performComp (queryName) ; 

gnScore « candidate->getGnScore ( ) ; 

printf ("Given Names Matched with a score of %f", gnScore); 



delete the allocated objects somewhere below. 



SNReturnC ode perf ormComp (SNQueryNamePata *queryName) ;|| 



J^if- '^''^^llf^^'^ Object to a query name object (SNQueiyNameData). On 
retimi from this metiiod, score information can be retrieved from this object using methods 
^^.t^^^9^ gl^g^,;ScoreO. etc. Hie comparison is conducted aShig to ^^^^ 
SSS^blfcf ^^Q^^^^--^ ^^i-^ to construct this ' 



Parameters: 



queryName A pomter to a SNQueryNameData object. This object is a representation of 
the queiy name, and should be constructed with the same SNOueryParms 
object that was used to create this SNEyalNameData object. 



Return Values: 



^, J^t^T/:*i?^7^^'^^ mdicating the result of the comparison. Values 
include SN_MATCH and SN_NO_MATCH, but the retuln code can also 
detlil? ^ ^^^^ of errors. See the documentation for SNReturnCode for full 



Examples: 

The example below shows a sample comparison: 

SNEvalNameData *candidate; 
SNQueryNameData *queryName; 

SNRetu^nC^: Se^' " ^NQueryParms (SN.PARMS.GENERIC) ; 

candidate - new SNEvalNameData (queryParms, "Bob Earl", "Jones"! • 
queryName - new SNQueryNameData (queryParms, "James Eak", -Jone;-) ; 

retCode - candidate->performComp (queryName) ; 
if (retCode -« SN MATCH) 

printf ("Names Matched"); 
else 

if (retCode — SN_NO_MATCH) 
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printf ("No Match") ; 
else 

printf ("Error !"); 
// delete the allocated objects somewhere below. 



[double getNameScore ( ) ; 



• ® '^""''^ calculated during the performComp Q method. The name 

^n^«^S • u/^'r ^l^/'^ sumaiiiescoreTancrthe given name and 

surname weights. If called before performComp Q is invoked, the result is undefined. 

Parameters: 
None. 
Return Values: 

A double value between 0.0 and 1 .0, indicating how closely the query and 
candidate names matched. 

Examples: 

The example below shows a comparison and subsequent name score 
examination; 

SNEvalNameData *candidate; 
SNQueryNameData *queryName; 

*queryParms - new SNQueryParms (SN FARMS GENERIC); 
double naraeScore; - - " . 

candidate - new SNEvalNameData (queryParms, "Bob Earl", "Jones " i • 
queryName - new SNQueryNameData (queryParms, "James Earl", "Jones"); 

candidate->performComp(queryName) ; 

nameScore •» candidate->getNameScore ( ) ; 
printf ("Names Matched with a score of %f", nameScore); 

// delete the allocated objects somewhere below. 



double qetSnScoreO ; 



Remms the surname score calculated during the perfonnCompO method. If called before 
performComp Q is mvoked, the result is undefin ed. ^ 

iltlSb^^^ S^f^^ScoreQ, which gives a score for the name as a whole. 

getsnScoreO allows the developer to examine the surname separately and is provided for 
those applications that require special consideration of the surname. ^ 

Parameters: 
None. 
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Return Values: 

A double value between 0.0 and l.O, indicating how closely the sumarae(s) .of 
the query and candidate names matched. 

Examples: 

The example below shows a comparison and subsequent surname score 
examination: 



SNEvalNameData *candidate; 
. S NQue r yName Da t a * que r yN ame ; 

SNQueryParms *queryParms » new SNQuery Farms (SN_PARMS_GENERIC) ; 
double snScore; 

candidate = new SNEvalNameData (queryParms, "Bob Earl", "Jones"); 
queryName - new SNQueryNameData {queryParms, "James Earl", "Jones"); 

candidate->performComp (queryName) ; 

snScore - candidate->getSnScore ( ) ; 

print f ("Surnames Matched with a score of %f", snScore); 
// delete the allocated objects somewhere below. 



IISNReturnCode performComp (SNQueryNameData ^queryName); 



Compares this evaluation name object to a query name object (SNQueryNameData). On 
return from this method, score infomiation can be retrieved from this object using methods 
such as getGnScore Q, getNameScore Q, etc. The comparison is conducted according to the 
parameters specified in the SNQueryParms object that was used to construct this 
SNEvalNameData obj ect. 



Parameters: 



queryName A pointer to a SNQueryNameData object. This object is a representation of 
the query name, and should be constructed with the same SNQueryParms 
object that was used to create this SNEvalNameData object 



Return Values: 

An SNRetumCode value indicating the result of the comparison. Values 
include SN^MATCH and SN_NO_MATCH, but the return code can also 
indicate a variety of errors. See the documentation for SNRetumCode for full 
details. 



Examples: 

The example below shows a sample comparison: 
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SNEvalNameData ^candidate; 
SNQueryNameData ^queryName; 

SNQueryParms *queryPanns - new SNQuery Farms {SN_PARMS_GENERIC) ; 
SNReturnCode retCode; 

candidate = new SNEvalNameData {queryParms, "Bob Earl", Mones");^^ 
queryName » new SNQueryNameData (queryParms, "James Earl"/ "Jones'); 

retCode = candidate->performComp (queryName) ; 
if (retCode «- SN^MATCH) 

printf ("Names Matched"); 
else 

if (retCode SN_NO_MATCH) 

printf { "No Match" ) ; 
else 

printf {"Error!") ; " 
// delete the allocated objects somewhere below. 



[virtual inline void calcComponentScores (SNQueryNameData *queryNameT] 

Calculates the component scores for the evaluation name. This function is called by the API, 
not by the developer. Specifically, it is called by the performComp Q method before the 
composite name score is calculated (via a call to calcNameScore ()). 

The method is virtual to allow subclasses of SNEvalNameData to provide score calculations 
for any application-specific data the developer may have added to the evaluation name. The 
default method calculates scores for the given name and surname components. Subclasses 
must call the base class implementation so that the given name and surname scores are set 
properly. 



Parameters: 



queryName A pointer to a SNQueryNameData object. This object is a representation of 
the query name, and should be constructed with the same SNQueryParms 
object that was used to create this SNEvalNameData object. 



Return Values: 

None. 
Examples: 

The example below shows a sample override of the 
calcComponentScores () method. In the example, we have defined a 
subclass of SNEvalNameData called MySNEvalNameData. This class 
includes an SSN data member, and a Boolean ssnMatch flag that should 
be set when the query and evaluation name have the same SSN. 



void MySNEvalNameData: icalcCompo'nentScores (SNQueryNameData *queryName) 
SNEvalNameData: : calcComponentScores (queryName) ; // have to call t 

if (ssn — queryName- >ssn) { 
ssnMatch - TRUE; 
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else 

ssnMatch » FALSE; 

} 

} 



[virtual void calcNameScore () | 

Calculates the composite name score, placing the result in the member variable nameScore . 
- . This function is called by the API. not by the developer. The method is virtual to allow 
subclasses of SNEvalNameData to incorporate other data and/or logic in the calculation. 

The fullimplementation of SNEvalNameData: :calcNameScore() appean in the 
SNEvalNameData.hpp header file. This gives the developer insight into how a customized 
method might incorporate additional data. 

On exit from this function, the nameScore member variable should result with a value 
between 0.0 and 1.0. 

Parameters: 

None. 
Return Values: 

None. 
Examples: 

The example below shows a sample override of the calcNameScore () 
method. In the example, we have defined a subclass of SNEvalNameData 
called MySNEvalNameData. This class includes a bornYear member 
variable. The sample calls the base class implementation, and then 
gives special consideration to people born before 1900, A more 
complicated example might replace the base class implementation 
entirely. 



void MySNEvalNameData : : calcNameScore ( ) 

{. - 
SNEvalNameData : : calcNameScore ( ) ; 

if (bornYear < 1900) ( 

nameScore *- 1.1; // give an extra 10 percent on the score. 

if (nameScore > 1.0) // make sure we do not exceed a perfect score. 
nameScore » 1.0 

) 

) 



[virtual int compareScore (SNEvalNameData *scQredName)| 

Compares the scored SNEvalNameData object to a second scored SNEvalNameData object. 
This function is called by the API, not by the developer. Specifically, it is called by an 
SNResultList object to dete.rmine the sort order of the matches it manages. 
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The method is virtual to allow subclasses of SNEvalNameDatat to incorporate other data 
and/or logic in the sorting process. In general, applications can ovemde the 
calcNameScoreO method to incorporate application-specific data mto the calculation of the 
name score. The name score is the most important factor in the default sort method so 
proper sorting occurs automatically. Because the compareScoreQ method is somewhat 
complex, overriding calcNameScoreQ is thfc preferred method. 

However there may be times when a more detailed modification of the sort method is 
required For example, the developer may wish to introduce a data element that does not . 
affect the name score at all, but does affect the sort order of any matches. In these cases, 
override the compareScoreO method. The fiill implementation of ^ ci -m,-. 

SNEvalNameData::compareScoreO appears in the SNEvalNameData.hpp header file. This 
gives th§>developer insight into how a customized method might incorporate additional 
data. The method is complex enough to warrant a brief discussion of its behavior 
(Discussion can also be found in the implementation of the fimction). 

In general, the compareScoreO method performs a series of comparisons to determine which 
evaluation name is better (i.e. closer to the query name). The compansons occur in - 
descending order of importance. If any comparison yields a discrepancy, the companson 
stops there. Otherwise, we proceed to the next comparison. The order of compansons is as 
follows: 

nameScore, 
snScore; 

if (snSegmentScoreMode « HIGHEST) 

snSegmentScores 
gnScore, 

if (gnSegmentScoreMode HIGHEST) 

gnSegmentScores ^ j 

snSegDiff (the difference in the number of sn segments between the query and tne 
gnSegDiff (the difference in the number of gn segments between the query and the 

A override of compareScoreQ would insert a comparison of some application-specific data 
at the desired point. For example, our subclass might include a Boolean flag indicating if 
this name's Social Security Number matched that of the query name exactly. Further, 
suppose our business rules dictate that all exact SSN matches should appear at the top of the 
results, regardless of name score. In this case, we would perform a comparison" of the 
Boolean flag prior to checking the name score: 

if {ssnMatch I 1 scoredName->ssnMatch) { 

if (! ssnMatch) ^ , 

return -1; // the scoredName is better, since it's an exact SSN match 

^^^if S!scoredName->ssnMatch) // this name is better, since ifs an exact S 
return 1; 

} 

) . 

// proceed with rest of default comparison, since both were an exact SSN match, 

In our conmved example, it would have been possible to just perform our check, and in the 
event of a tie. call the base class implementation. If our desired insertion point had been 
somewhere in the middle of the comparison order, we would be forced to provide a full • 
version of the method. The example below demonstrates this. 



Parameters: 
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scoredName A pointer to an SNEvalNameData object. This object is a representation of 
another evaluation name that has already been scored. 



Return Values: 

1 if this evaluation name is a better match than the supplied evaluation name. 
-1 if the supplied evaluation name is a better match than this evaluation name. 
0 if both names match the query name with the same degree of confidence. 
Examples: 

-The example below shows an override of the compareScore () method. The 
example supposes a subclass of SNEvalNameData that introduces an 
integer variable, ssnScore, which is a number between 0 and 9, 
indicating how many digits matched between the query and evaluation 
name's SSN. Suppose we want the SSN score to be considered after the 
overall name score, but before a comparison of the surname and given 
name scores. The resulting method looks a lot like the base class 
implementation, but we have inserted a comparison of the ssnScore 
just after the nameScore comparison: 



virtual int MySNEvalNameData: : compareScore {SNEvalNameData *scoredName) 
int re- 
double scoreDiff = scoredName->getNameScore ( ) - nameScore; 

if {scoreDiff < 0.0) 

rc - -1; 
else if (scoreDiff > 0.0) 

rc - 1; 
else { 

scoreDiff = scoredName->ssnScore - ssnScore; // < — inserted c 

if {scoreDiff < 0.0) 

rc = -1; 
else if (scoreDiff > 0.0) 

rc « 1; 

else { .// ' < — of inse 

// scores were the same, so look at snScore 
scoreDiff = scoredName->getSnScore () - snScore; 

if {scoreDiff < 0.0) 

rc « -1; 
else if (scoreDiff > 0.0) 

rc - 1; 
else ( 

// see if our snSegmentScoreMode mode is 

// HIGHEST, If it is, we need to check the sn segment scores 
if (queryParms->getSnSegmentScoreMode() — SN_SEGMODE_HIGHEST) . 
scoreDiff « compareSegmentScores (scoredName, SN_LAST_NAME) ; 

// see if we still are equal after the above check 
if (scoreDiff < 0.0) 

rc = -1; 
else if (scoreDiff > 0.^0) 

rc = 1; 
else { 

// scores were the same, so look at gnScore 

scoreDiff « scoredName->getGnScore ( ) - gnScore; 

if (scoreDiff < 0.0) 
rc - -1; 
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eise if (scoreDiff > 0.0) 

rc = 1; 
else { 

if (queryParms->getGnSegmentScoreMode () — SN_SEGMODE_HIG 
scoreDiff = compareSegmentScores ( scoredName, SN_riRST_ 

// see if we still are equal after the above check 
if {scoreDiff < 0.0) 

rc = -1; 
else if (scoreDiff > 0.0) 

rc - 1; 
else ( 

int segDiff; 

// scores were the same, so look at snSegDif f erential 
// for this case, smaller is better, so switch operand 
segDiff - snSegDif ferential - scoredName->snSegDif fere 
if (segDiff < 0) 

rc = -1; 
else if (segDiff > 0) 

rc » 1; 
else ( 

// scores were the same, so look at snSegDiffe 

// for this case, smaller is better, so switch 

segDiff =» gnSegDif ferential - scoredName->gnSegDif f 

if (segDiff < 0) 

rc = -1; 
else if (segDiff > 0) 

rc « 1; 
else { 

rc " 0; 

} 



return rc; 

1 



[virtual SNReturnCode getCompResult (")] 



Determines if the SNEvaiNameData object is considered a match or not. This function is 
called by the API, not by the developer. Specifically, it is called during the performComp Q 
method after the name has been scored. 

The method is virtual to allow subclasses of SNEvaiNameData to incorporate other data 
and/or logic in the match determination process. For example, an application may wish to 
reduce a threshold depending on some application-specific data. 

The fiill implementation of SNEvaiNameData: igetCompResultQ appears in the 
SNEvalNameData.hpp header file. This gives the developer insight into how a customized 
method might incorporate additional data. The default method checks to see if the scores 
( gnScore , snScore , and nameScore) meet or exceed their respective thresholds. The 
thresholds are set via the SNQueryParms object associated with this evaluation name. 

Parameters: 
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None. 
Return Values: 

An SNRetumCode value of either SN_MATCH or SN_NO_MATCH. 
Examples: 

The example below shows an override of the 

SNEvalNameData: :getCompResult() method. Our example assumes a 
subclass of SNEvalNameData, then adds a Boolean ssnMatch flag that is 
set to true if the query and evaluation name SSNs match exactly. In 
-oyr., override, we reduce the thresholds by 10 percent when the 
ssnMatch flag is true. 



SNReturnCode MySNEvalNameData : : getCompResult { ) 
{ 

SNReturnCode retCode; 

double adjustedGnScoreThresh «= queryParms->getGnScoreThresh ( ) ; 

double adjustedSnScoreThresh = queryParms->getSnScoreThresh ( ) ; 

double adjustedNameScoreXhresh - queryParms->getNameScoreThresh 

if {ssnMatch) { // adjust the threshold if we have an exact ssn mat 

adjustedGnScoreThresh 0.9; 
adjustedSnScoreThresh *« 0.9; 
adjustedNameScoreThresh 0.9; 

} 

if ( (nameScore >= adjustedNameScoreThresh) && 
(gnScore >« adjustedGnScoreThresh) &6 
(snScore >= adjustedSnScoreThresh)) 
retCode « SN_MATCH; 

else 

retCode « SN_NO_MATCH; 
return retCode; 

} 



[virtual void resetScores { )| 



Clears out the scores associated with the SNEvalNameData object. Developers that wish to 
reuse SNEvalNameData objects for multiple queries must call this function before each call 
to performComp Q. If an application creates and deletes SNEvalNameData objects for each 
query it processes, this function is not necessary. 

When subclassing SNEvalNameData, you should override this method to reset any scoring 
variables you have added. In doing so, be sure to call the base class's implementation. 

Parameters: 

None. 
Return Values: 

None. 
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Examples: 

The example .below shows the same SNEvalNameData object being used in 
two separate queries. Note the call to resetScores { ) before the 
second query is performed. 



SNEvalNameData *candidate; 
SNQueryNameData *queryNamel; 
SNQueryNameData *queryName2; 

SNQueryParms • *queryPanns « new SNQueryParms (SN_PARMS_GENERIC) ; 
int snSegDiff erential; 

candidate « new SNEvalNameData (queryParms, "Bob Earl'\ "Jones"); 
->qugryNamel = new SNQueryNameData (queryParms, "James Earl", "Jones"); 

candidate->perforTnComp (queryName) ; 

candidate->resetScores I ) ; 

queryName2 « new SNQueryNameData (queryParms, "Jimmy", "Jones"); 
■ candidate->performComp ( queryName ) ; 

// delete the allocated objects somewhere below. 



SNAPI is a trademark of Language Analysis Systems. All other products mentioned are registered trademarks or 
trademarks of their respective companies. 

Questions or problems regarding this web site should be directed to webmaster@las-lnc.com. 
Copyright (D 1997 Language Analysis Systems. All rights reserved. 
Last modified: Tuesday November 25, 1997. 
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Overview 

SNNameData encapsulates the basic information required to describe a name. It is the base 
class for both SNEvalNamePata and SNQueryNameData. 

The developer never instantiates an object of this class. However, SNNameData defines 
some members and member functions that are useful to applications, and they are 
documented here. 



Subclassing 

Applications should not subclass from SNNameData directly. Instead, subclasses should be 
derived from SNEvalNameData and SNQueryNameData as appropriate. 



Methods Summary 



Common Methods: 



getGn O Returns the given name, in its original case. 
getSnQ Returns the sumame, in its original case. 



Attributes: 
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char gn(]; An NULL terminated array of characters that holds the original giveii 

name, in its original case. The array is large enough to hold 
SN_MAX_GN_LEN characters. 

SNQueryParms A pointer to the SNQueryParms object used to create this name 

*queryParnis; object, 

char gn[l; An NULL terminated array of characters that holds the original 

surname, in its original case. The array is large enough to hold 
SN MAX SN LEN characters. 



Method Details: 



[char * getGn O ;) 

Returns the given name, in its original case. This is primarily a convenience parameter, but 
can also be used to determine how the API separated a single name string into separate 
given name and sumame fields. 

Parameters: 

None. 
Return Values: 

The given name as a NULL terminated string. 
Examples: 

The example below shows the construction of an SNEvalNameData object 
and a subsequent given name examination: 

SNEvalNameData ^candidate; 

SNQueryParms ♦queryParms « new SNQueryParms (SN_PARMS_GENERIC) ; 

candidate « new SNEvalNameData (queryParms, "Bob Earl Jones Jr", 

SN_LAST_SEG_IS_SURNAME) ; 

printf{"The given name was %s\n", candiclate->getGn ( ) ; 

// delete the allocated objects somewhere below. 

[char * getSn ( ) ;| 

Returns the sumame, in its original case. This is primarily a convenience parameter, but can 
also be used to determine how the API separated a single name string into separate given 
name and sumame fields. 

Parameters: 
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None. 
Return Values: 

The given name as a NULL terminated string. 
Examples: 

The example below shows the construction of an SNEvalNameData object 
and a subsequent surname examination: 

-SNgvalNameData ♦candidate; 

SNQueryParms *query Farms = new SNQuery Farms (SN_PARMS_GENERIC) ; 

candidate - new SNEvalNameData (query Farms, "Bob Earl Jones Jr", 

SN_LAST_SEG_IS_SORNAME) ; 

printf("The surname was %s\n", candidate->getSn ( ) ; 
// delete the allocated objects somewhere below. 



SNAP! is a trademark of Language Analysis Systems. All other products mentioned are registered trademarks 
trademarks of their respective companies. 

Questions or problems regarding this web site should be directed to webmaster@laS'inc.com. 
Copyright ® 1997 Language Analysis Systems. All rights reserved. 
Last modified: Tuesday November 25, 1997. 
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Overview 

A subclass of SNNameData, SNQueryNameData represents a query name that will be 
compared agamst many evaluation names (SNEvalNameData objects). Built on top of 
SNNameData , SNQueryNameData adds a mechamsm to manage results (an SNRcsultsList 
object) through a series of comparisons. In addition, it adds variables to store pre-processing 
information about the query name, such as list of variant names. 

The developer may subclass this class to add any data that might be useful to attach to a 
query name. Such additions are often done in tandem with similar additions to an analogous 
subclass of the SNEvalNameData class. See the subclassing section for more details. 

The developer is responsible for deleting any SNQueryNameData objects created by their 
code. Typically, the developer will construct a single SNQueryNameData object, and 
compare it to multiple evaluation names ( SNEvalNameData objects). Once the query is 
completed, and the results have been retrieved, the query object is deleted. 

The developer may attach an SNResultsList Object to the SNQueryNameData object for the 
purpose of results.management. The SNResultsList object handles issues of comparing and 
sorting evaluation names that are determined to be matches. In addition, the SNResultsList 
object can trim the set of matching names down to the best N names, where N is specified 
by the developer. Use of an SNResultsList object is optional - if desired, the developer can 
provide their own match management. See the SNResultsList documentation for a more 
detailed discussion. 



Subclassing 

A developer may wish to subclass SNQueryNameData to allow application-specific data to 
be incorporated into the search process. For example, suppose an application needs to search 
for names in a database that contains given name/ surname, and birthdate. Suppose further 
that the application needs to include birthdate as a factor in the search. By subclassing both 
SNQueryNameData and SNEvalNameData , and adding a birthdate member data variable to 
each, the developer can ovemde methods with SNEvalNameData (e.g. calcNameScore) to 
include age differential (how close in age the candidate is to the query) m the comparison. 
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The following are methods that can be overridden to provide specialized processing in a 
subclass: 

--SNQueryNameDataO Destructor for the class. Ensure that the destructor for your 
subclass frees any resources your subclass allocates. 



Methods Summary 



CommoD Methods: 



SNQueryNamePata Q Various constructors for the class. 

getResultsList O Returns the SNResultsList object associated with this query object. 

setResultsListQ Attaches an SNResultsList object to this query object. 



Attributes: 

A pointer to the SNResultsList object that is being used to manage the 
matches for the query object. If no results list has been attached, the 
value of this variable is NULL. 



Method Details: 



Constructors: 



lis NQueryName Data (SNQueryParms *qParms, 


char 


*gn, char *sn); 


llSNQueryNameData (SNQueryParms *qParms, 


char 


*gn, char *sn, char *nin) ;" 


IISNQueryName Data (SNQueryParms *qParms, 


char 


*name, SNName Format nameFormat) ; 



Each constructor creates a new SNQueryNameData object. All forms of the constructor take 
a pointer to an SNQueryParms object, which should be the same one used to create the 
SNEvalNamePata object that this object will be compared against. 



The SNAPI system is based internally on a name model that considers given name and 
surname. However, other constructors are provided for cases where an alternate format is 
desired. In these cases, the constructor maps the supplied data into SNAPPs given 
name/surname model. 

The first fonn of the constructor takes a given naipe and surname. This is the most efficient 
and accurate form, because the data already corresponds to SNAPFs internal name model. 

The second form accomodates systems that have knowledge of a middle name. Currently, 
SNAPI maps the middle name into the given name. Future versions of SNAPI may provide 



SNResultsList 
*resultsList; 
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more sophisticated handling of middle name data. 

The third fonn accomodates systems that represent names as a single string, rather than 
separate fields. This form takes an SNNameFormat parameter that dictates how the string 
will be mapped into the given name/surname model. Values for this parameter mclude: 



Exects the name string in the form "surname, 
given name". If no comma is found in the string, 
the given name is assumed to be unknown, and 
the entire string is placed in the surname field. 

Expects the string to be in the form "given_name 
surname". The last segment is placed in the 
surname field. All other segments are placed in 
the given name field. If a string has just one 
segment, that segment is placed in the surname 
field, and the given name field is assumed to be 
unknown. The processing is intelligent in that 
TAQ values are recognized when. determining the 
last segment. This allows a name such as "Bob 
Jones Jr." to be mapped correctly with given 
name of "Bob" and a surname of "Jones Jr", 
rather than incorrectly assigning " Jr" as the 
surname. 



SN_SURNAME_COMMA_GIVENNAME 
SN_LAST_SEGlIS_SURNAME 



SNJ4AME_F0RMAT_UNKN0WN 



Currently operates identically to 
SN_LAST_SEG JS_SURNAME. Future 
versions of SNAPI might use linguistic expertise 
to make automated decisions about parsing the 
string into name fields. 



Parameters: 

qParms 

sn 



A pointer to an SNQueryParms object. This should be the same object used 
to create the SNEvalNamePata objects that this SNQueryNameData object 
will be compared against. 

A NULL terminated string that represents the given name. 

A NULL terminated string that represents the surname. 

mn A NULL terminated string that represents the middle name. 

name A NULL terminated string that represents all components of the name as a 

single string. 

nameFormat An enumerated type value that specifies how to interpret the name string 
when breaking it into given name and surname. See documentation for 
SNNameFormat for valid values. 



Return Values: 
None. 
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Memory Management: 

The responsibility for deleting an SNQueryNameData lies with the developer. 
In general, an SNQueryNameData object should be deleted shortly after it has 
been compared to all SNEvalNameData objects that need to be considered for 
the query, and after all results from the query have been retrieved. 

Examples: 

The example below shows four equivalent SNQueryNameData objects being 
constructed. In each case, the supplied name gets mapped to an 
internal representation of: Given Name « "Bob Earl", Surname • 
"Jones" . 



SNQueryNameData *queryl; 
SNQueryNameData *query2; 
SNQueryNameData *query3; 
SNQueryNameData *query4; 

SNQueryParms *queryParms « new SNQueryParms (SN_PARMS_GENERIC) ; 

queryl «= new SNQueryNameData (queryParms , "Bob Earl", "Jones"); 
query2 « new SNQueryNameData (queryParms , "Bob", "Jones", "Earl"); 
query3 = new SNQueryNameData (queryParms , "Bob Earl Jones", 

SN_LAST_SEG_IS_SURNAME) ; 
query4 = new SNQueryNameData (queryParms, "Jones, Bob Earl", 

SN_SURNAME_COMMA_GIVENNAME) ; 

// delete the allocated objects somewhere below. 



[SNResultsList * getResultsList()| 

Returns the SNResultsList object associated with this query object. If no SNResultsList 
object has been associated with this query object, the function returns NULL. In general, the 
developer does not need to call this function because they already have a pointer to the 
results list object (since they created it). 

Parameters: 

None. 

Return Values: 

A pointer to the SNResultsList object associated with this query object. If no 
SNResultsList object has been associated with this query object, the function 
returns NULL. 



||void setResultsLlst (SNResultsList *aResult3Uist)| 



Sets the resultsList member variable. Call this member function to attach an SNResultsList 
object to the query object. In general, an application will create a new SNResultsList object 
for each query, and pass a pointer to the SNResulstList object to setResultsListQ. After the 
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query is completed, the SNResultsList object should be deleted. 



Parameters: 



aResuitsList A pointer to a SNResultsList object that will manager the matches for 
this query. 



Return Values: 
-None. 

Examples: 



The example below shows a sample query session using an SNResultsList 
object: 



SNEvalNameData ^candidatel; 
SNEvalNameData *cancliclate2; 
SNQueryNameData *queryName; 

SNQueryParms *queryParms « new SNQuer y Farms (SN_PARMS_GENERIC) ; 
SNReturnCode retCode; 
SNResultsList *myResultsList «- NULL; 

cahdidatel « new SNEvalNameData (queryParms, "Bob Earl", "Jones"); 
candidate2 = new SNEvalNameData (queryParms, "Earl", "Jhonas"); 
queryName » new SNQueryNameData (queryParms, "James Earl", "Jones"); 

myResultsList - new SNResultsList (1) ; // create a manager for just 1 mate 
queryName->setResultsList (myResultsList) ; 

candidatel->performComp (queryName) ; 
candidate2->perforraComp (queryName) ; 

delete candidatel; ' 
delete candidate2; 

if (myResultsList->getNumHits 0 > 0) { 

SNEvalNameData *matchName - myResultsList->getHitAt (0) ; 

printf("best match was %s, %s\n", matchName->getSn () , matchName->getGn ( ) 

) 

else 

printf ("Neither name Matched"); 

delete myResultsList; 
delete queryName; 



SNAPI is a trademark of Language Analysis Systems. All other products mentioned are registered trademarks or 
trademarks of their respective companies. / . 

Questions or problems regarding this web site should be directed to webmaster(gilas-inc.com. 
Copyright © 1997 Language Analysts Systems. All rights reserved. 
Last modified: Tuesday November 25, 1997. 
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Overview 

The SNQueryParms class encapsulates all the tunable parameters that determine how a 
name is processed and compared to another name. A simple application can create a 
single SNQueryParms object, adjust it accordingly, and use it to perform all query 
processing, A more complex application might need to re-adjust parameter settings for 
each query, perhaps based on user selections. 

An SNQueryParms object provides access to over forty parameters. Many of these 
parameters provide highly specialized tuning required only for particular 
circumstances. Other specialized parameters are used to address the nuances of names 
within a certain culture. 

In an effort to shield the developer from the complexity of these numerous parameters, 
the API provides sets of pre-defined default parameters. These sets are organized by 
culture - for example the Hispanic parameters set contains values suitable for 
evaluating Hispanic names. In general, most applications need to adjust only a few of 
the available parameters to achieve desired results. The availabl eultural parameter 
pa ckages and their default values for each parameter are available for inspection. The 
developer creates a set of parameters by constructing an SNQueryParms object, which 
takes a cultural specifier as an argument. The culture used to create a parameters object 
also determines the subset o fFAO values and variants that will be used when 
processing names created with the parameters object. 

Parameters often specify factors, thresholds, or scores. It is important to understand the 
distinction between each of these: 

Factor A factor is a number that is applied to an existing score to arrive at a new sco 
For example, when comparing two name se g ments that are out of place, the 
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OPJISJacior is applied to the segment score to arrive at a new (lower) score. 

Threshold A threshold is a score that a comparison must achieve in order to be consider 
match. SNQueryParms defines thresholds for the given name score, the sum 
score, and the composite name score (which considers the name as a whole). 

Score A score specifies a value to assign in a particular situation. For example, 

SNQueryParms defines d^first name unknown score, which specifies the scor 
assign to a segment comparison when one of the segments is unknown. Note 
a factor can still be applied to a score after it has been assigned. 

The tiocumentation below organizes the methods of this class into usage categories of 
Common , Specialized, and Advanced, 

• Common methods adjust the most basic parameters, involving minimal 
complexity. Most applications will only need to use methods from this category. 

• 5pecza//ze^ methods are slightly more complex in nature, requiring a basic 
understanding of the name scoring process to be used properly. . . 

• Advanced methods address very specific behaviors within the name comparison, 
and require a deep understanding of the issues involved in name analysis. 
Because the API provides default values for these settings, most applications 
will never need to use these methods. 

Because SNQueryParms provides methods to retrieve and set the value of each 
parameter, both methods are presented together. Further, many parameters operate on a 
particular name field and therefore exist in pairs (one that affectaii ven name 
processingi and another that affects surname processing). Because these pairs of 
functions are in all other respects identical, ftill documentation is provided with the 
methods that operate on the given name. The analogous function for the surname 
references the detail presented for the given name function. 



Subclassing 

In general, an application should not need to subclass SNQueryParms. However, an 
advanced application may need to introduce a new, customized parameter that will be 
referenced during name comparisons. 

For example, an application that uses Social Security Number when comparing names 
might introduce a tunable parameter called ssnScoreTheshold. This parameter would 
specify an SSN score that a name would need to beat in order to be considered a 
match. The parameter value would then be compared to the ssnScore in the . 
application's override of the SNEvalNameData: getCom p Resu li() method. See the 
SNEvalNa mePata class and its calcCompone ntScoresQ method for more details. 

Note that in the above example, subclassing SNQueryParms is necessary only if we 
need a tunable threshold parameter. If, on the other hand, the value of the threshold is a 
fixed value, the threshold value itself can be specified in the override of the 
SNEvalNameData::getCompResult() method. ' 
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The following are methods that can be overridden to provide specialized processing in 
a subclass: _ . 

.-SNQueryParmsO Destructor for the class. Ensure that the destructor for your subclass 
any resources your subclass allocates. 

loadFromFileO Loads in parameter values from a file. 
saveToFileO Writes out parameter values to a file. 



Methods Summary 



Common Methods: 



Constructor for the class. 



^ etGnScoreThreshO Gets or sets the given name score threshold (the lowest given name 
setGnScoreThreshO score a name can receive and still be considered a match). 



oetGnWeightO 
setGnWeightO 



^etScoreThreshO 
setScoreThreshO 



Gets or sets the given name weight. This parameter controls the 
importance of the given name score (relative to the surname score) 
when computing the composite name score. 

Gets or sets the overall name score threshold (the lowest score a name 
can receive and still be considered a match). 



getSnScoreThreshO Gets or sets the surname score threshold (the lowest surname score a 
setSnScoreThreshO name can receive and still be considered a match). 



getSnWeightO 
se tSnWeightO 



petStatusO 



Gets or sets the surname weight. This parameter controls the 
importance of the surname score (relative to the give name score) when 
computing the composite name score. 

Returns the current status of the object. This function is used to ensure 
the successful construction of the object. 



Specialized Methods: 
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getCheck CnUnknownsQ 
setC heckGnUnknownsQ 



g etCheck S nUnknownsO 
s e tCh eckS n U nknownsO 



Gets or sets the flag that determines if given name 
segments should be checked for the special strings 
"NFN", "NMN", "FNU", and "MNU" when 
processing names. These strings are commonly used in 
legacy systems to indicate that a name is unknown or 
does not exist. 

Gets or sets the flag that determines if surname segments 
should be checked for the special strings "NLN" and 
"LNU" when processing names. These strings are 
commonly used in legacy systenis to indicate that a 
name is unknown or does not exist. 

Gets or sets the score to assign to a given name segment 
comparison where the given name is unknown. A given 
name segment is considered unknown if it is blank, or if 
it is specified as "FNU" or "MNU". 

getCnlnitialOnlnitialMatchScoreO Gets or sets the score to assign to a given name segment 
setGnlnitiaiOnlnitiallVIatchScoreO comparison where both segments are initials (assuming 

given name initial matching is turned on via 

selMatchGnlnitialO). 



g etFNUScoreO 
setFNUScoreO 



g etGnlnitialScoreO 
setGnlnitialScoreQ 



getGnSegmentScoreModeO 
setGnSegmentScoreModeO 



getLNUScoreO 
se tLNUScore O 



getlWatchGnlntialO 
setMatchGnlntialO 



getMatchSnlntialQ 
setMatchSnlntialO 



getNFNScoreQ 
set NFNS coreO 



gctNLNScoreO 
sctJSJUNScpreO 



Gets or sets the score to assign to a given name segment 
comparison involving an initial (assuming given name 
initial matching is turned on via setMatchGnInitial()), 

Gets or sets the given name segment score mode. The 
parameter determines how to handle multi -segment 
names. 

Gets or sets the score to assign to a surname segment 
comparison where the surname is unknown. A surname 
segment is considered unknown if it is blank, or if it is 
specified as "LNU". 

Gets or sets the flag that determines if a given name 
segment comparison should give special consideration to 
initials. 

Gets or sets the flag that determines if a surname 
segment comparison should give special consideration to 
initials. 

Gets or sets the score to assign to a given name segment 
comparison where the given name does not exist. Note 
that a blank name is considered to be unknown. Only 
given name segments, specified as "NFN" or "N^4N" 
are considered non-existent. 

Gets or sets the score to assign to a surname segment 
comparison where the surname does not exist. Note that* 
a blank name is considered to be unknown. Only given 
name segments specified as "NLN" are considered 
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getNoiscCharsO 
sctNoiseChjirsO 



non-existent. 

Gets or sets the set of characters that are discarded when 
processing a name. 



g etSegmentBreakCharsO 
sctSegmentBreakCharsO 



Gets or sets the set of characters that are considered 
segment separators. 



o etSnlnitialScoreO Gets or sets the score to assign to a surname segment 

setS nlnitialScoreO comparison where one segment is an initial (assuming 

surname initial matching is turned on via 

seiMatchSnlnitialQ). 

getSnlnitialQnInitialMatchScoreO Gets or sets the score to assign to a surname segment 
setSnlnitiaiOnlnitialMatchScoreO comparison where both segments are initials (assuming 

surname initial matching is turned on via 

setMatchSnlnitialO). 



getSnSegmentScoreModeO 
setSnSegmentScoreModeO 



Gets or sets the surname segment score mode. The 
parameter determines how to handle multi-segment 
names. 



Advanced Methods: 



getAbsDelGnTAOFactorO 
setAbsDeiGnTAQFactorO 



getAbsDelSnTAOFactorO 
setAbsDelSnTAOFactorO 



g etAbsDisGnTAOFactorO 
setAbsDisGnTAOFactorO 



o etAbsDisS n TAOFactorO 
set AbsDisSnTAOFactorO 



Gets or sets the factor to apply to a given name segment 
score when a delete TAQ value is associated with one 
segment, but no delete TAQ value is associated with the 
other segment. The factor is applied only if given 
nameTAQ scoring is enabled (see 
setGnTAQProcessingModeO)- 

Gets or sets the factor to apply to a surname segment score 
when a delete TAQ value is associated with one segment, 
but no delete TAQ value is associated with the other 
segment. The factor is applied only if surname TAQ 
scoring is enabled (see setSnTAQProcessingMode()). 

Gets or sets the factor to apply to a given name segment 
score when a disregard TAQ value is associated with one 
segment, but no disregard TAQ value is associated with 
the other segment. The factor is applied only if given name 
TAQ scoring is enabled (see setGnTAQProcessingModeO). 

Gets or sets the factor to apply to a surname segment score 
when a disregard TAQ value is associated with one 
segment, but no disregard TAQ value is associated with 
the other segment. The factor is applied only if surname 
TAQ scoring is enabled (see setSnTAQProcessingMode()). 
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gc tCheckGnCo m pressedNameO 
s ctCheckGnC om pressedNa meQ 



g etCheckSnComnressedNam eO 
setCheckSnCompressedNam eQ 



g etPelGnTAQFactorO 
setPelGnTAQFactorO 



getPelSnTAOFactorO 
setPelSnTAOFactorO 



getPisGnTAOFactorn 
setPisGnTAOFactorO 



getPisSnTAOFactorO 
setPisSnTAOFactorO 



getGnAnchorFactorO 
setGnAnchorFactorO 



getGnAnchorSegmentModeO 
setG n AnchorSe g mentModeO 



Gets or sets the flag that determines if a compressed name 
comparison should be performed on the given name. See 
the method details for a description of the compressed 

name check. 

Gets or sets the flag that determines if a compressed name 
comparison should be performed on the surname. See the 
method details for a description of the compressed name 
check. 

Gets or sets the factor to apply to a given name segment 
score when a delete TAQ value is associated with one 
segment, and a different delete TAQ value is associated 
with the other segment. The factor is applied only if given 
name TAQ scoring is enabled (see 
setGnTAQProcessingModeO). 

Gets or sets the factor to apply to a surname segrtient score 
when a delete TAQ value is associated with one segment, 
and a different delete TAQ value is associated with the 
other segment. The factor is applied only if surname TAQ 
scoring is enabled (see setSnTAQProcessingModeQ). 

Gets or sets the factor to apply to a given name segment 
score when a disregard TAQ value is associated with one 
segment, and a different disregard TAQ value is associated 
with the other segment. The factor is applied only if given 
name TAQ scoring is enabled (see 
setGnTAQProcessingModeO). 

Gets or sets the factor to apply to a surname segment score 
when a disregard TAQ value is associated with one 
segment, and a different disregard TAQ value is associated 
with the other segment. The factor is applied only if 
surname TAQ scoring is enabled (see 
setSnTAQProcessingModeQ). 

Gets or sets the factor to apply to a given name segment 
score when the two segments are in place, but their ordinal 
position is not the anchor segment (as specified with the ' 
setGnAnchorSegmentModeO method). 

Gets or sets the given name anchor segment as either first, 
last, or none. 



getGnCompressedNameScoreO 
setGnCompressedNameScoreO 



Gets or sets the score assigned when two given names 
match via the compressed name check. 



getG nOOPSFactorO Gets or sets the factor to apply to a given name segment 

setG n OO PSFacto rO score when the two segments are out of place (their ordinal 

position within the name field is different). Note that the 
anchor segment setting affects the determination of a 
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segment's ordinal position. 

Gets or sets the TAQ processing mode for the given name 
field. 



g etSnAnchorFactorO 
setSnAnchorFactorO 



getSnAnchorSe g mentModeO 
set SnAnchorSegmentModeO 



Gets or sets the factor to apply to a surname segment score 
when the two segments are in place, but their ordinal 
position is not the anchor segment (as specified with the 
SetSnAnchorSegmentModeO method). 

Gets or sets the surname anchor segment as either first, last, 
or none. 



getSnComnressedNameScoreO 
setSnCompressedNameScoreQ 



Gets or sets the score assigned when two surnames match 
via the compressed name check. 



getSnOOPSFactorO 
setSnOQPSFactorO 



getSnTAQ ProcesstngModeO 
setSnTAOProcessingModeO 



Gets or sets the factor to apply to a surname segment score 
when the two segments are out of place (their ordinal 
position within the name field is different). Note that the 
anchor segment setting affects the determination of a 

segment's ordinal position. 

Gets or sets the TAQ processing mode for the surname 
field. 



g etUseGnLeftBiasQ 
setUseGnLeftBiasQ 



g etUseCnVariantsO 
setUseGnVariantsO 



getUseSnLeftBiasO 
setUseSnLeftBiasO 



getUseSnVariantsO 
setUseSnVariantsO 



Gets or sets the flag that determines if character based 
given name segment comparisons will place more 
emphasis on leading characters. 

Gets or sets the flag that determines if the API will 
reference its internal list of variants when processing given 
name segments. 

Gels or sets the flag that determines if character based 
surname segment comparisons will place more emphasis 
on leading characters. 

Gets or sets the flag that determines if the API will 
reference its internal list of variants when processing 
surname segments. 



Attributes: 

All attributes of the SNQueryParms class are protected and should not be accessed directly. Use the 
get and set methods for the desired attribute to inspect or set a particular attribute. 
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getSegmentBreakCharsO returns a pointer to the API's copy of current 
segment break characters. - • 



double getSnlnitialScore () 

SNReturnC^cie setSnlnitialScore (double aScore) 



Gets or sets the surname "Initial Match" score. This is the score to assign during a 
segment comparison when one or the other segment (but not both) is an initial (it 
consists of just one character). In order for the score to be assigned, surname initial- 
matching must have been turned on via the setMatchSnlntialQ method. Otherwise, the 
segments are compared via the standard character string comparison. See the 
setSnlnitialOnlnitialMatchScoref fanction for detail on how the API handles 
comparisons where both segments are initials. 



Parameters: 



aScore A double value between 0.0 and 1 .0 inclusive. Any value outside this range 
an error. 



Return Values: 

setSnlnitialScoreO returns a nSNRetumCode value indicating the 
success of the operation: 

SN^SUCCESS The modification was successful. 

SN_INVALID_SN_INIT_SCORE The specified score is invalid: 

getSnlnitialScoreO returns the current "Initial Match" score (a double). 
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double getSnlnitialOnlnitialMatchScore () 

SNReturnCode setSnlnitialOnlnitialMatchScore (double aScore) 



Gets or sets the surnam e "Initial on Initial Match" score. This is the score to assign 
during a segment comparison when both segments are initials (they consist of just one 
chai^cter). In order for the score to be assigned, surname initial matching must have 
been turned on via the setMatchSnlntial Qmethod. Otherwise, the segments are 
compared via the standard character string comparison. 

This method is provided to give applications more control over how initials are treated 
during a comparison. Most systems consider an initial match to be somewhat less exact 
than an exact match. For example, the surnames "Jones" and "J" do not match as 
closely as "Jones" and "Jones". However, the score to assign to the given names "R" 
and "R" is subject to interpretation by the application. Such a comparison could be 
considered an initial match, an exact match, or something in between. By providing 
this funable parameter, the API give the developer the ability to decide exactly how 
such situations should be handled. 



Parameters: 

A double value between 0.0 and 1 .0 inclusive. Any value outside this range 
aScore ^„ 

an error. 

Return Values: 

setSnlnitialOnlnitialMatchScoreO returns ai SNReturnCode value 
indicating the success of the operation: 

SN^SUCCESS The modification was sue 

SN_INVALID_SN^INIT_ON_INIT_MATCH_SCORE The specified score is inva 

getSnlnitialOnlnitialMatchScoreO returns the current surname "Initial 
On Initial Match" score (a double). 
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SNSegScoreMode 
void 



getSnSegmentScoreMode () 

setSnSegmencScoreMode (SNSegScoreMode aMode) 



Gets or sets the surname segment score mode. 

The'siimame segment score mode governs how the API computes a surname score 
when the both surnames involved in the comparison have more than one segment. See 
the analogous setGnSegmentScoreModeO method for details. 



Parameters: 



An SNSegScore value of SN^SEGMODE^HIGHEST, SN^SEGMODE AV 
or SN_SEGMODE,LOWEST. 



Gets or sets the given name "absent delete TAQ" factor. The "absent delete TAQ" 
factor is applied to a segment score when one of the segments has an associated delete 
TAQ, but the other does not. This factor should be viewed as a penalty that gets 
applied to the segment score in the situation described above. See the discussion on 
TAQs for an explanation of the different types of TAQ values. See the discussion on 
TAQ Scoring for information on how TAQs are used to adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Return Values: 



getSnSegmentScoreModeO returns the current surname segment score 
mode. 



double 

SNReturnCode 



getAbsDelGnTAQFactor () 
setAbsDelGnTAQFactor (double aFactor ) 
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Parameters: 



aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 



setAbsDelGnTAQFactorO returns an SNRetumCode value indicating 
the success of the operation: 



SN SUCCESS 



The modification was successful. 



SN_rNVALID_ABS_DEL_GN_TAQ_FACTOR The specified factor is invalid. 

getAbsDelGnTAQFactorO returns the current "absent delete TAQ" 
factor. 



Gets or sets the surname "absent delete TAQ" factor. The "absent delete TAQ" factor 
is applied to a segment score when one of the segments has an associated delete TAQ, 
but the other does not. This factor should be viewed as a penalty that gets applied to 
the segment score in the situation described above. See the discussion oi TAQs for an 
explanation of the different types of TAQ values. See the discussion o riTAQ Scoring 
for information on how TAQs are used to adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



double 

SNReturnCode 



getAbsDelSnTAQFactor () 
setAbsDelSnTAQFactor (double aFactor) 
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aFactor A double value between 0.0 and l.O inclusive. 



Return Values: 



setAbsDelSnTAQFactorQ returns an SNRetumCode value indicating 
the success of the operation: 



The modification was successful. 



SN SUCCESS 



SN_INVALID^ABS_DEL_SN_TAQ_F ACTOR The specified factor is invalid. 

getAbsDelSnTAQFactorO returns the current "absent delete TAQ" 
factor. 



Gets or sets the given name "absent disregard TAQ" factor. The "absent disregard 
TAQ" factor is applied to a segment score when one of the segments has an associated 
disregard TAQ, but the other does not. This factor should be viewed as a penalty that 
gets applied to the segment score in the situation described above. See the discussion 
on TAQs for an explanation of the different types of TAQ values. See the discussion 
on TAP Scoring for information on how TAQs are used to adjust segment scores. 



These are advanced methods and should only be used by those with a . deep 
understanding of name searching issues. 



Parameters: 



double 

SNReturnCode 



getAbsDisGnTAQFactor () 
setAbsDisGnTAQFactor (double aFactor) 



aFactor A double value between 0.0 and 1 .0 inclusive. 
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Return Values: 



setAbsDisGnTAOFactoF^Vfetums an SNRetum Code value indicating 
the success of the operation: 



SN SUCCESS 



The modification was successful. 



SN^INVALID^ABS^DIS^GN^TACLF ACTOR The specified factor is invalid. 

getAbsDisGnTAQFactorO returns the current "absent disregard TAQ" 
factor. 



Gets or sets the surname "absent disregard TAQ" factor. The "absent disregard TAQ" 
factor is applied to a segment score when one of the segments has an associated 
disregard TAQ, but the other does not. This factor should be viewed as a penalty that 
gets applied to the segment score in the situation described above. See the discussion 
on TAQs for an explanation of the different types of TAQ values. See the discussion 
on TA Q Scoring for information on how TAQs are used to adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



a Factor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setAbsDisSnTAQFactorQ returns an ST^R 'etumCode value indicating 
the success of the operation: 



double 

SNReturnCode 



getAbsDisSnTAQFactor () 
setAbsDisSnTAQFactor (double aFactor) 
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SN SUCCESS The modificatton was successful. 

SNJNVALID_ABS_DIS_SN_TAQ_F ACTOR The specified factor is invalid. 

gelAbsDisSnTAQFactorO returns the current "absent disregard TAQ" 
factor. 



BOOL getCheckGnCompressedName () 

void setCheckGnCompressedName (BOOL aBool) 



Gets or sets the flag that determines if a compressed name comparison should be 
performed oh the given name . 

After the given name has been score, the API can optionally perform a compressed 
name comparison on the given name. For this comparison, all segment break 
characters and noise characters are removed from both the query and evaluation giyen 
names. If the two strings match exactly, the given name score is set to the given name 
compressed name score feetGnCompressedNameScoreQ ), unless the existing given 
name score is already higher than the given name compressed name score. 

The given name compressed name check can be though of as a way to squeeze all of a 
given name's segments together. This can help solve problems associated with 
discrepancies in the segmentation of names. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aBool A BOOL value of TRUE or FALSE. 



Return Values: 
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getCheckGnCompressedNanieO returns the current value of the flag. 



BOOL getCheckSnCompressedName () 

-void setCheckSnCompressedName (BOOL aSool) 



Gets or sets the flag that determines if a compressed name comparison should be 
performed on the surname . 

After the surname has been scored, the API can optionally perform a compressed nanie. 
comparison on the surname. For this comparison, all segment break characters and 
noise characters are removed from both the query and evaluation given names. If the 
two strings match exactly, the sumame score is set to the surname compressed name 
score (setSnCompressedNameScoreO) . unless the existing sumame score is already 
higher than die sumame compressed name score. 

The sumame compressed name check can be thought of as a way to squeeze all of a 
surname's segments together. This can help solve problems associated with 
discrepancies in the segmentation of names. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

aSooi A BOOL value of TRUE or FALSE. 

Return Values: 

gelCheckSnCompressedNameO returns the current value of the flag. 
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TAQs, but no disregard TAQ value is common to both segments. This factor should 
be viewed as a penalty that gets applied to the segment score in the situation described 
above. See the discussion on TAQ s for an explanation of the different types of TAQ 
values. See the discussion on TAQ Scorin g for information on how TAQs are used to 
adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

aFactor A xlouble value between 0.0 and 1 .0 inclusive. 
Return Values: 

setDisGnTAQFactorQ returns an SNRetumCode value indicating the 
success of the operation: 

SN_SUCCESS The modification was successful. 

SN_.INVALID_DIS_GN_TAQ_F ACTOR The specified factor is invalid. 

getDisGnTAQFactorO returns the current "disregard TAQ" factor. 



double getDisSnTAQFactor ( ) 

SNRetumCode setDisSnTAQFaccor (double aFactor) 



Gets or sets the surname "disregard TAQ" factor. The "disregard TAQ" factor is 
applied to a segment score when both segments have one or more associated disregard 
TAQs, but no disregard TAQ value is common to both segments. This factor should 
be viewed as a penalty that gets applied to the segment score in the situation described 
above. See the discussion on TAQs for an explanation of the different types of TAQ 
values. See the discussion on TAO Scoring for information on how TAQs are used to 
adjust segment scores. 
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These are advanced methods andshould only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 



setDisSnTAQFactorQ returns an SNReturnCode value indicating the 
success of the operation: 



SN SUCCESS 



The modification was successful. 



SN^INVALID_DIS_SN_TAQ_F ACTOR The specified factor is invalid. 
getDisSnTAQFactorO returns the current "disregard TAQ" factor. 



Gets or sets the factor to apply to a givcn name segment score when the two segments 
are in place, but their ordinal position is not the anchor segment (as specified with the 

setGnAnchorSegmentMode Q method). 

The anchor factor should be viewed as a way to diminish the importance of a match if 
the match occurs between two segments that are not in the anchor segment position. 
For example, Arabic given names commonly include one or more segments. The first 
segment is the more stable segment and should therefore be considered the anchor 
segment. A match between two segments in the second given name position is 
considered to be of less importance (relative t6 the first segment), and as such, that 
segment score is diminished by applying the anchor factor. 



double 

SNReturnCode 



getGnAnchorFactor () 
setGnAnchorFactor (double aFactor) 
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Note that the given name anchor factor is only applied when the two segments are in 
place (they are in the same position). Given name segments that are out of place are 
adjusted by the given name "out of place segment" score ieiGjiOOPSFactgrQ). In 
addition, the given name anchor factor is only applied, when the given name anchor 
segment mode (setGnAnchorSegmentModeQ) has been set. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aFactor A double value between 0.0 and 1.0 inclusive. 



Return Values: 

setOnAnchorFactorQ returns anS NReturnCodc value indicating the 
success of the operation: 



SN_INVALID_GN_ANCHOR_F ACTOR The specified factor is invalid. 
getGnAnchorFactorO returns the current "given name anchor segment" 



Gets or sets the given name anchor segment mode. Setting the anchor segment mode 
causes the API to place emphasis on a particular segment within the given name (the 
first segment, or the last segment). When this feature is turned off, all segments are 
considered to be equally important. See th esetGnAnchorFa ctorf) method for details on 
how the anchor segment affects segment scoring. 



SN SUCCESS 



The modification was successful. 



factor. 



SNAnchorSegMode 
void 



gecGnAnchorSegmentMode () 

setGnAnchorSegmentMode (SNAnchorSegMode anAnchorMode) 



Page 43 



SNQueryParms Class Documentation 



The given name anchor segment is also used to determine how segments in two names 
are lined up (to determine which segments are in place or out of place). When the 
anchor segment is set to SN_ANCHOR_SEG_NONE or 
SN_ANCHOR_SEG_FIRST, segment alignment starts from the left (the first 
segment). When the anchor segment is set to SN_,ANCHOR_SEG_LAST. segment 
alignment starts from the right (the last segment). See th ectC nO OPSFacto rQ method 
for details on how the API adjusts the score of segments that are out of place. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



anAnchorMode A SNAnchorSegMode value: 



SN ANCHOR SEG NONE 



No segment carries more importanc 
another. Name segments are lined u 
left to determine which segment co 
are in place. 



SN ANCHOR SEG FIRST 



The first segment is the most impor 
segment. Name segments are lined 
left to determine which segment co 
are in place. 



SN ANCHOR SEG LAST 



The last (right most) is the most im 
segment. Name segments are lined 
right to determine which segment 
comparisons are in place. 



Return Values: 



getGnAnchorSegmentModeO returns the current "given name anchor 
segment" mode. 



double 

SNReturnCode 



getGnCompressedNameScore (} 
setGnCompressedNameScore (double aScore) 
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Gets or sets the score to assign to a successftifeiyen name compressed name 
comparison. See the setCheckGnCompressedNam eif) method for detail on compressed- 
name comparisons. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



ascore A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setGnCompressedNameScoreO returns an SNRetumC ode value 
indicating the success of the operation: 

SN^SUCCESS The modification was succ 

SN_INVALID_GN_COMPRESSED_NAME_SCORE The specified score is inval 

getGnCompressedNameScoreO returns the current "given name 
compressed name" score. 



double getGnOOPSFactor ( ) 

SNReturnCode setGnOOPSFactor (double aFactor) 



Gets or sets the given name "out of place segment" factor. This is the factor that is 
applied to a segment score when the two segments are out of place (their ordinal 
positions are different).The given name anchor segment mode 
(s etGnAn c horSeeModeO ) affects how segment alignment is performed. 

To understand how alignment affects in place/out of place determination, consider the 
given names "Earl Bob" and "James Earl Bob". If we align these names on the left, 
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we get: 



jName 1: jEarl 


Bob 1 


Name 2: James [Earl |Bob 



If we line the names up on the right, we get: 



Name 1: 


Earl 


Bob 


|Name 2: 


James |Earl 


Bob 



Notice that in the first case, the "Earl" and "Bob" segments are out of place, so we 
would apply the given name "out of place segment" factor to their segment scores. In 
the second case, because we align on the right, the segments are in place, so their 
segment scores are not adjusted. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setGnOOPSFactorO returns an SNRetumCode value indicating the 
success of the operation: 

SN_SUCCESS The modification was successful. 

SN_INVALID_GN_OOPS_,FACTOR The specified factor is invalid. 

getOnOOPSFactorO returns the current given name "out of place 
segment" factor. 
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SNTAQProcessingMode 
void 



getGnTAQProcessingMode () 

setGnTAQProcessingMode { SNTAQProcessingMode aMode ) 



Gets or sets the mode that determines how to procesagi ven n ame TAQ values. 



The following modes are supported:. 



Mode 



SN_TAQ_MODEJGNORE 



Description 

The API will not check given name segments t 
are TAQ values. 



The API will check each given name segment t 
SN_TAQ_MODE_JUST_REMOVE a TAQ value. If so, the value is removed as tho 
" existed. 



See the discussion on TAQs for an explanation of the different types of TAQ values. 
See the discussion on TAQ Scoring for information on how TAQs are used to adjust 
segment scores. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



An SNTAQProcessingMode value of either SN_f AQ^MODEJGNORE or 
SN_TAQ_MODE_JUST_REMOVE. 



Return Values: 

getGnTAQProcessingModeO returns the current given name TAQ 
processing mode. 



SN_TAQ_MODEJGNORE 



The API will check each given name segment i 
a TAQ value. If so, the segment gets associated 
proper stem segment, and is used in the compu 
stem segment's score. 



Page 47 



SNQueryParms Class Documentation 



double 

SNReturnCode 



getSnAnchorFacc^r {)• 
setSnAnchorFactor (double aFactor) 



Gets or sets the factor to apply to a surnam e segment score when the two segments are 
in-place, but their ordinal position is not the anchor segment (as specified with the 
setSnAnchorSejgmentModeO method). 

the anchor factor should be viewed as a way to diminish the importance of a match if 
the match occurs between two segments that are not in the anchor segment position. 
For example, Hispanic surnames commonly include two segments. The first segment 
is the true surname and should therefore be considered the anchor segment. A match 
between two segments in the second position is considered to be of less importance 
(relative to the first segment), and as such, that segment score is diminished by 
applying the anchor factor. 

Note that the surname anchor factor is only applied when the two segments are in 
place (they are in the same position). Surname segments that are out of place are 
adjusted by the surname "out of place segment" score ^etSnOOPSFactorO) . In 
addition, the surname anchor factor is only applied when the surname anchor segment 
mode (setSnAnchorSegmentModeO) has been set. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setSnAnchorFactorO returns an SNRetumCode value indicating the 
success of the operation: 



SNJNVALID_.SN^ANCHOR^F ACTOR The specified factor is invalid; 



SN SUCCESS 



The modification was successful. 
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getSnAnchorFactorO returns the current "surname anchor segment' 
factor. " • 



SNAnchorSegMode getSnAnchorSegmentMode {) 

void ~ setSnAnchorSegmentMode (SNAnchorSegMode anAnchorMode) 



Gets or sets the surname anchor segment mode. Setting the anchor segment mode 
causes the API to place emphasis on a particular segment within the surname (the first 
segment, or the last segment). When this feature is turned off, all segments are 
considered to be equally important. See ihe setSnAnchorFactorO method for details on 
how the anchor segment affects segment scoring. 

The surname anchor segment is also used to determine how segments in two names 
are lined up (to determine which segments are in place or out of place). When the 
anchor segment is set to SN_ANCHOR_SEG_NONE or 
SN_ANCHOR_SEG_FIRST, segment alignment starts from the left (the first 
segment). When the anchor segment is set to SN_ANCHOR_SEG_LAST, segment 
alignment starts from the right (the last segment). See th eetSnOOPSFactorO method 
for details on how the API adjusts the score of segments that are out of place. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



anAnchorMode A SNAnchorSegMode value: 



SN ANCHOR SEG NONE 



No segment carries more importance 
than another. Name segments are line 
up on the left to determine which 
segment comparisons are in place. 



SN ANCHOR SEG FIRST 



r 



The first segment is the most importan 
segment. Name segments are lined up. 
the left to determine which segment 
comparisons are in place. 
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The last (right most) is the most 
SN ANCHOR SEG LAST }PP9^^"^ segment Name segments ar 
- - Imed up on the right to determme whi 

segment comparisons are in place. 



Return Values: 

^^^^.getSnAnchorSegmentModeO returns the current "surname anchor 
segment" mode. 



double getSnCompressedNameScore () 

SNReturnCode setSnCompressedNameScore (double aScore) 



Gets or sets the score to assign to a successfu burname compressed name comparison. 
See the setCheckSnCompressedNamef) method for detail on compressed name 
comparisons. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

aScore A double value between 0.0 and 1.0 inclusive. 

Return Values: 

setSnCompressedNameScoreO returns an SNRetumCode value 
indicating the success of the operation: 

SN^SUCCESS The modification was succe 
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SN_INVALID_SN_COMPRESSED_NAME_SCORE The specified score- is inval 



getSnCompressedNameScoreO returns the current "surname 
compressed name" score. 



double getSnOOPSFactor ( } 

SNReturnCode setSnOOPSFactor (double aFactor) 



Gets or sets the surname "out of place segment" factor. This is the factor that is applied 
to a segment score when the two segments are out of place (their ordinal positions are 
different).The surname anchor segment mode §etSnAnchorSegMode() ) affects how 
segment alignment is performed. 

To understand how alignment affects in place/out of place determination, consider the 
surnames "Garcia Gomez " and "Valdez Garcia Gomez". If we align these names on 
the left, we get: 



Name 1: Garcia Gomez 



Name 2: Valdez Garcia j Gomez 



If we line the names up on the right, we get: 



Name 1: 




Garcia Gomez 


Name 2: 


Valdez 


Garcia Gomez 



Notice that in the first case, the "Garcia" and "Gomez" segments are out of place, so 
we would apply the surname "out of place segment" factor to their segment scores. In 
the second case, because we align on the right, the segments are in place, so their 
segment scores are not adjusted. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 
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Parameters: 

aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

"''"^ setSnOOPSFactorO returns an SNRetumCode value indicating the 
success of the operation: 



SN_SUCCESS The modification was successful. 

SN_INVALID_SN_OOPS_FACTOR The specified factor is invalid. 



getSnOOPSFactorO returns the current surname "out of place segment" 
factor. 



SNTAQProcessingMode getSnTAQProcessingMode () 

void setSnTAQProcessingMode (SNTAQProcessingMode aMode) 



Gets or sets the mode that determines how to proces ssurname TAQ values. 
The following modes are supported: 

Mode Description 

SN TAO MODE IGNORE check surname segments to s 

- - - TAQ values. 

The API will check each surname segment to s 
SN_^TAQ_MODE_JUST_REMOVE TAQ value. If so, the value is removed as thou 

existed. 

The API will check each surname segment to s 
SN TAQ_MODE IGNORE If so, the segment gets associated 

" ~ ^proper stem segment, and is used in the compu 

stem segment's score. 
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See the discussion on TAQ s for an explanation of the different types of TAQ values. 
See the discussion on TAP Scorin g for information on how TAQs are used to adjust 
segment scores. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



^MoH* ^ SNTAQProcessingMode value of either SN TAQ_ MODE IGNORE or 
aMode SN_TAQ_MODE_JUST_REMOVE. " 



Return Values: 



getSnTAQProcessingMcdeO retums the current surname TAQ 
processing mode. 



BOOL getUseGnLeftBias () 

void setUseGnLeftBias (BOOL aSool) 



Gets or sets the flag that determines i fciven name segment comparisons should be 
biased towards matches that occur at the beginning of the segment. When this feature 
is turned on, as we move to the right, matching character pairs are given decreasingly 
less credit in calculating a segment score. When this feature is turned off, all matching 
character pairs receive full credit, regardless of their position with their respective 
segment. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

aBool A BOOL value of TRUE or FALSE. 
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Return Values: 



getUseGnLeftBiasO returns the current value of the flag (TRUE or 
FALSE). 



BOOL getUseGnVariants ( ) 

void setUseGnVariants (BOOL aBool) 



Gets or sets the flag that determines i feiven name segment comparisons should check 
to see if the two segments are Unguisiicvanants of each other. 

The API maintains internal tables that describe relationships between name variants. 
Each'variant relationship has an associated score and culture. When comparing two 
segments, the API examines the value of the "use given name variants" flag. If it is 
turned on, the intemal variant tables are searched to see if there is a variant relationship 
between the two segments, within the culture associated with this query (as determined 
by the SNQueryParms object used to perform the comparison). There is also a generic 
set of variants that are searched independent of culture. If a variant relationship is 
found, its associated score is assigned to the segment score, and no character based 
comparison is performed. 

At present, the set of variants and their associated scores can not be modified by the 
developer. 

These are advanced methods and should only be used by those with a deep . 
understanding of name searching issues. 



Parameters: 

aBool A BOOL value of TRUE or FALSE. 

Return Values: / - 
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getUseGnVariantsQ returns the current value of the flag (TRUE or 
FALSE). 



BOOL 
void 



getUseSnLeftBias () 
setUseSnLeftBias (BOOL aBool) 



Gets or sets the flag that determines i feumame segment comparisons should be biased 
towards matches that occur at the beginning of the segment. When this feature is ' 
turned on, as we move to the right, matching character pairs are given decreasingly 
less credit in calculating a segment score. When this feature is turned off, all matching 
character pairs receive full credit, regardless of their position with their respective , 
segment. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aBool A BOOL value of TRUE or FALSE. 



Return Values: 

getUseSnLef^BiasQ returns the current value of the flag (TRUE or 



FALSE). 



BOOL 
void 



getUseSnVariants () 
setUseSnVariants(BOOL aBool) 
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SNSegScoreMode getSnSegmentScoreMode ( ) 

void setSnSegmentScoreMode (SNSegScoreMode aMode) 



Gets or sets the surname segment score mode. 

The'siimame segment score mode governs how the API computes a surname score 
when the both surnames involved in the comparison have more than one segment. See 
the analogous setGnSegmentScoreModeQ method for details. 



Parameters: 



An SNSegScore value of SN_SEGMODE_HIGHEST, SN_SEGMODE>V 
aMode SN_SEGM0DE_L0WEST. 



Return Values: 



getSnSegmentScoreModeO returns the current surname segment score 
mode. 



double 


getAbsDelGnTAQFactor () 


SNReturnCode 


setAbsDelGnTAQFactor (double aFactor) 



Gets or sets the given name "absent delete TAQ" factor. The "absent delete TAQ" 
factor is applied to a segment score when one of the segments has an associated delete 
TAQ, but the other does not. This factor should be viewed as a penalty that gets 
applied to the segment score in the situation described above. See the discussion on 
TAOs for an explanation of the different types of TAQ values. See the discussion on 
TAP Scoring for information on how TAQs are used to adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 
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Parameters: 



aFactor 



A double value between 0.0 and 1 .0 inclusive. 



Return Values: 



setAbsDelGnTAQFactorQ returns an SNRetumCode value indicating 
the success of the operation: 



SN SUCCESS 



The modification was successful. 



SN_rNVALID_ABS_DEL_GN_TAQ_FACTOR The specified factor is invalid. 

getAbsDelGnTAQFactorO returns the current "absent delete TAQ" 
factor. 



Gets or sets the surname "absent delete TAQ" factor. The "absent delete TAQ" factor 
is applied to a segment score when one of the segments has an associated delete TAQ, 
but the other does not. This factor should be viewed as a penalty that gets applied to 
the segment score in the situation described above. See the discussion o rTAQs for an 
explanation of the different types of TAQ values. See the discussion o riTAQ Scoring 
for information on how TAQs are used to adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



double 

SNReturnCode 



getAbsDelSnTAQFactor () 
setAbsDelSnTAQFactor (double aFactor) 
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aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 



setAbsDelSnTAQFactorQ returns an SNRetumCode value indicating 
the success of the operation: 

SN_SUCCESS The modification was successful. 

SN_INVALID_ABS_DEL_SN_TAQ_F ACTOR The specified factor is invalid. 

getAbsDelSnTAQFactorO returns the current "absent delete TAQ" 
factor. 



double getAbsDisGnTAQFactor () 

SNReturnCode setAbsDisGnTAQFactor (double abactor) 



Gets or sets the given name "absent disregard TAQ" factor. The "absent disregard 
TAQ" factor is applied to a segment score when one of the segments has an associated 
disregard TAQ, but the other does not. This factor should be viewed as a penally that 
gets applied to the segment score in the situation described above. See the discussion 
on TAQs for an explanation of the different types of TAQ values. See the discussion 
on TAP Scoring for information on how TAQs are used to adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

aFactor A double value between 0.0 and 1 .0 inclusive. 
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Return Values: 



setAbsDisGnTAQFactorO returns an SNRetum Code value indicating 
the success of the operation: 



SN SUCCESS 



The modification was successful. 



SN_INVALID^ABS_DIS_GN_TAQ_F ACTOR The specified factor is invalid. 

getAbsDisGnTAQFactorO returns the current "absent disregard TAQ" 
factor. 



Gets or sets the surname "absent disregard TAQ" factor. The "absent disregard TAQ" 
factor is applied to a segment score when one of the segments has an associated 
disregard TAQ, but the other does not. This factor should be viewed as a penalty that 
gets applied to the segment score in the situation described above. See the discussion 
on TAQs for an explanation of the different types of TAQ values. See the discussion 
on TAQ Scoring for information on how TAQs are used to adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setAbsDisSnTAQFactorQ returns an SNR etumCode value indicating 
the success of the operation: 



double 

SNReturnCode 



getAbsDisSnTAQFaccor () 
setAbsDisSnTAQFactor (double aFactor) 
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SN SUCCESS 



The modification was succesisful. 



SN_INVALID_ABS_DIS_SN_TAQ_F ACTOR The specified factor is invalid. 
getAbsDisSnTAQFactorO returns the current "absent disregard TAQ" 



Gets or sets the flag that determines if a compressed name comparison should be 
performed on the given name . 

After the given name has been score, the API can optionally perform a compressed 
name comparison on the given name. For this comparison, all segment break 
characters and noise characters are removed from both the query and evaluation given 
names. If the two strings match exactly, the given name score is set to the given name 
compressed name score feetGnCompressedNameScoreQ ), unless the existing given 
name score is already higher than the given name compressed name score. 

The given name compressed name check can be though of as a way to squeeze all of a 
given name's segments together. This can help solve problems associated with 
discrepancies in the segmentation of names. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



factor. 



BOOL 
void 



getCheckGnCompressedName (). 
setCheckGnCompressedName {BOOL aBool ) 



aBool A BOOL value of TRUE or FALSE. 



Return Values: 
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getCheckGnCompressedNameO returns the current value of the flag. 



BOOL getCheckSnCpmpressedName () 
'void setCheckSnCompressedName (BOOL aBool) 



Gets or sets the flag that determines if a compressed name comparison should be 
performed on the sumame . 

After the surname has been scored, the API can optionally perform a compressed name 
comparison on the surname. For this comparison, all segment break characters and 
noise characters are removed from both the query and evaluation given names. If the 
two strings match exactly, the surname score is set to the surname compressed name 
score (setSnCompressedNameScore()\ unless the existing surname score is already 
higher than the surname compressed name score. 

The surname compressed name check can be thought of as a way to squeeze all of a 
surname's segments together. This can help solve problems associated with 
discrepancies in the segmentation of names. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aBool A BOOL value of TRUE or FALSE. 



Return Values: 

getCheckSnCompressedNameO returns the current value of the flag. 
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( 



SNQueryParms Class Documentation 



Page 40 



SNQueryPamis Class Documentation 

TAQs, but no disregard TAQ value is common to both segments. This factor should 
be viewed as a penalty that gets applied to the segment score in the situation described 
above. See the discussion on TAQ s for an explanation of the different types of TAQ 
values. See the discussion on TAO Scorin g for irifdrfnation on how TAQs are used to 
adjust segment scores. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



a Factor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setDisGnTAQFactorO returns an SNRetumCode value indicating the 
success of the operation: 

SN_SUCCESS The modification was successful. 

SN_INVALID_DIS_GN_TAQ_F ACTOR The specified factor is invalid, 

getDisGnTAQFactorO returns the current "disregard TAQ" factor. 



double 


getDlsSnTAQFactor ( ) 


SNReturnCode 


setDisSnTAQFactor (double aFactor) 



Gets or sets the surname "disregard TAQ" factor. The "disregard TAQ" factor is 
applied to a segment score when both segments have one or more associated disregard 
TAQs, but no disregard TAQ value is common to both segments. This factor should 
be viewed as a penalty that gets applied to the segment score in the situation described 
above. See the discussion on TAQs for an explanation of the different types of TAQ 
values. See the discussion on TAO Scoring for information on how TAQs are used to 
adjust segment scores. 
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These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setDisSnTAQFactorO returns an SNReturnCode value indicating the 
success of the operation: 



SN_INVALID_DIS_SN_TAQ_F ACTOR The specified factor is invalid. 
getDisSnTAQFactorO returns the current "disregard TAQ" factor. 



Gets or sets the factor to apply to a given name segment score when the two segments 
are in place, but their ordinal position is not the anchor segment (as specified with the 

setGnAnchorSegmentMode Q method). 

The anchor factor should be viewed as a way to diminish the importance of a match if 
the match occurs between two segments that are not in the anchor segment position. 
For example, Arabic given names commonly include one or more segments. The first 
segment is the more stable segment and should therefore be considered the anchor 
segment. A match between two segments in the second given name position is 
considered to be of less importance (relative to the first segment), and as such, that 
segment score is diminished by applying the anchor factor. 



SN SUCCESS 



The modification was successful. 



double 

SNReturnCode 



getGnAnchorFactor () 
setGnAnchorFactor (double aFactor) 
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Note that the given name anchor factor is only applied when the two segments are in 
place (they are in the same position). Given name segments that are out of place are 
adjusted by the given name "out of place segment" score letGjiOOPSFaciorQ). In 
addition, the given name anchor factor is only applied when the given namie anchor 
segment mode (setGnAnchorSegmentModeQ) has been set. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aFactor A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setOnAnchorFactorQ returns anS NReturnCode value indicating the 
success of the operation: 



SN_SUCCESS The modification was successful. 

SN_INVALID_GN^ANCHOR_F ACTOR The specified factor is invalid. 

gelGnAnchorFactorO returns the current "given name anchor segment" 
factor. 



SNAnchorSegMode getGnAnchorSegmentMode ( ) 

void setGnAnchorSegmentMode (SNAnchorSegMode anAnchorMode) 



Gets or sets the given name anchor segment mode. Setting the anchor segment mode 
causes the API to place emphasis on a particular segment within the given name (the 
first segment, or the last segment). When this feature is turned off, all segments are 
considered to be equally important. See th esetGnAnchorPa ctorO method for details on 
how the anchor segment affects segment scoring. 
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The given name anchor segment is also used to determine how segments in two names 
are lined up (to determine which segments are in place or out of place). When the 
anchor segment is set to SN_ANCHOR_SEG_NONE or 
SN_ANCHOR_SEG_FIRST, segment alignment stans from the left (the first 
segment). When the anchor segment is set to SN_ANCHOR_SEG_LAST, segment 
alignment starts from the right (the last segment). See theetGnO QPSFacto rO method 
for details on how the API adjusts the score of segments that are out of place. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



anAnchorMode A SNAnchorSegMode value: 



SN ANCHOR_SEG_,NONE 



No segment carries more importanc 
another. Name segments are lined u 
left to determine which segment co 
are in place. 



SN ANCHOR SEG_FIRST 



The first segment is the most impor 
segment. Name segments are lined 
left to determine which segment co 
are in place. 



SN ANCHOR SEC LAST 



The last (right most) is the most im 
segment. Name segments are lined 
right to determine which segment 
comparisons are in place. 



Return Values: 



getOnAnchorSegmentModeO returns the current "given name anchor 
segment" mode. 



double 

SNReturnCode 



getGnCompressedNameScore (} 
setGnCompressedNameScore {double aScore) 



Page 44 



SNQueryParms Class Documentation 



Gets or sets the score to assign to a successfufei yen name compressed name 
comparison. See the setCheckGnCompressedNam eh method for detail on compressed 
name comparisons. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



ascore A double value between 0.0 and 1 .0 inclusive. 



Return Values: 

setOnCompressedNameScoreO returns an SNRetumC ode value 
indicating the success of the operation: 

SN_SUCCESS The modification was succ 

SN_INVALID_GN_COMPRESSED_NAME_SCORE The specified score is inva! 

getGnCompressedNameScoreO returns the current "given name 
compressed name" score. 



double getGnOOPSFactor () 

SNReturnCode setGnOOPSFactor (double aFactor) 



Gets or sets the given name "out of place segment" factor. This is the factor that is 
applied to a segment score when the two segments are out of place (their ordinal 
positions are different).The given name anchor segment mode 
(setGnAn c horSeeModeO) afiecls how segment alignment is performed. 

To understand how alignment affects in place/out of place determination, consider the 
given names "Earl Bob" and "James Earl Bob". If we align these names on the left, 
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we get: 



Name 1: [EaFT" [Bob [" 



Name 2: [James [EarT [Bob 



If we line the names up on the right, we get: 



[Name 1: 



Earl Bob 



Name 2: James Earl Bob 



Notice that in the first case, the "Earl" and "Bob" segments are out of place, so we 
would apply the given name "out of place segment" factor to their segment scores. In 
the second case, because we align on the right, the segments are in place, so their , 
segment scores are not adjusted. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

a Fact or A double value between 0.0 and 1 .0 inclusive. 
Return Values: 

setGnOOPSFactorO returns an SNRetumCode value indicating the 
success of the operation: 

SN_SUCCESS The modification was successful. 

SNJNVALID_GN_OOPS_F ACTOR The specified factor is invalid. 

getCnOOPSFactorO returns the current given name "out of place 
segment" factor. 
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SNTAQProcessingMode 


getGnTAQProcessingMode () 


void 


setGnTAQProcessingMode (SNTAQProcessingMode aMode) 



Gets or sets the mode that determines how to processgi ven n ame TAQ values. 



The following modes are supported:. 
Mode 

SN_TAQ_MODE_IGNORE 

SN_TAQ_MODE^JUST_REMOVE 

SN^TA(i.MODEJGNORE 



Description 

The API will not check given name segments t 
are TAQ values. 

The API will check each given name segment t 
a TAQ value. If so, the value is removed as tho 
existed. 

The API will check each given name segment t 
a TAQ value. If so, the segment gets associated 
proper stem segment, and is used in the compu 
stem segment's score. 



See the discussion on TAQs for an explanation of the different types of TAQ. values. 
See the discussion on TAQ Scoring for information on how TAQs are used to adjust 
segment scores. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



An SNTAQProcessingMode value of either SN_tAQ_MODEJGNORE or 
aMode SN_TAQ,MODE_JUST_REMOVE. 



Return Values: 

getGnTAQProcessingModeO returns the current given name TAQ 
processing mode. 
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double 

SNReturnCode 



gecSnAnchorFactor () 
setSnAnchorFactor (double aFactor) 



Gets or sets the factor to apply to a surnam e segment score when the two segments are 
in place, but their ordinal position is not the anchor segment (as specified with the 
setSnAnchorSegmentModeO method). 

The anchor factor should be viewed as a way to diminish the importance of a match if 
the match occurs between two segments that are not in the anchor segment position. 
For example, Hispanic sumames commonly include two segments. Tlie first segment 
is the true surname and should therefore be considered the anchor segment. A match 
between two segments in the second position is considered to be of less importance 
(relative to the first segment), and as such, that segment score is diminished by 
applying the anchor factor. 

Note that the surname anchor factor is only applied when the two segments are in 
place (they are in the same position). Surname segments that are out of place are 
adjusted by the surname "out of place segment" score ^etSnOQPSFactorO) . In 
addition, the surname anchor factor is only applied when the surname anchor segment 
mode (setSnAnchorSegmentModeO) has been set. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aFactor A double value between 0.0 and 1.0 inclusive. 



Return Values: 

setSnAnchorPactorO returns anSNRetumCode value indicating the 
success of the operation: 



SNJNVALID_SN_ANCHOR_F ACTOR The specified factor is invalid; 



SN SUCCESS 



The modification was successful. 
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getSnAnchorFactorO returns the current "surname anchor segment" . 
factor. 




Gets or sets the surname anchor segment mode. Setting the anchor segment mode 
causes the API to place emphasis on a particular segment within the surname (the first 
segment, or the last segment). When this feature is turned off, all segments are 
considered to be equally important. See the setSnAnchorFactorO method for details on 
how the anchor segment affects segment scoring. 

The surname anchor segment is also used to determine how segments in two names 
are lined up (to determine which segments are in place or out of place). When the 
anchor segment is set to SN_^ANCHOR^SEG^NONE or 
SN_ANCHOR_SEG_FIRST, segment alignment starts from the left (the first 
segment). When the anchor segment is set to SN_ANCHOR_SEG_LAST, segment 
alignment starts from the right (the last segment). See th eetSnOQPSFactorO method 
for details on how the API adjusts the score of segments that are out of place. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



anAnchorMode A SNAnchorSegMode value: 



SN ANCHOR SEG NONE 



No segment carries more importance 
than another. Name segments are line 
up on the left to determine which 
segment comparisons are in place. 



SN ANCHOR SEG.FIRST 



The first segment is the most importan 
segment. Name segments are lined up 
the left to determine which segment 
comparisons are in place. 
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The last (right most) is the most 
SN ANCHOR SEG LAST ["^P^^^"^ segment Name segments ar 
- - lined up on the nghi to determme whi 

segment comparisons are in place. 



Return Values: 

"^^'getSnAnchorSegmentModeO returns the current "sumame anchor 
segment" mode. 



double getSnCompressedNameScore () 

SNReturnCode setSnCompressedNameScore (double aScore) 



Gets or sets the score to assign to a success fuburname compressed name comparison. 
See the setCheckSnCompressedNameO method for detail on compressed name 
comparisons. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

aScore A double value between 0.0 and 1 .0 inclusive. 

Return Values: 

setSnCompressedNameScoreO returns an SNReturnCode value 
indicating the success of the operation: 

SN_SUCCESS The modification was succe 
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SNJNVALID_SN_COMPRESSED_NAME^SCORE The specified score is inval 

getSnCompressedNameScoreO returns the current "surname 
compressed name" score. 



double getSnOOPSFactor 0 

SNRecurnCode setSnOOPSFactor (double aFactor) 



Gets or sets the surname "out of place segment" factor. This is the factor that is applied 
to a segment score when the two segments are out of place (their ordinal positions are 
different).The surname anchor segment mode 6etSnAnchorSegMode(li) affects how 
segment alignment is performed. 

To understand how alignment affects in place/out of place determination, consider the 
surnames "Garcia Gomez " and "Valdez Garcia Gomez". If we align these names on 
the left,' we gel: 



Name 1: 


Garcia 


Gomez 




Name 2: 


Valdez 


Garcia 


Gomez 



If we line the names up on the right, we get: 



Name 1: 




Garcia Gomez 


Name 2: 


Valdez 


Garcia Gomez 



Notice that in the first case, the "Garcia" and "Gomez" segments are out of place, so 
we would apply the surname "out of place segment" factor to their segment scores. In 
the second case, because we align on the right, the segments are in place, so their 
segment scores are not adjusted. 



These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 
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A double value between 0.0 and 1.0 inclusive. 



Return Values 



setSnOOPSFactorO returns an SNRetumCode value indicating the 
success of the operation: 



SN_SUCCESS The modification was successful. 

SN_INVALID_SN_0OPS_FACT0R The specified factor is invalid. 



getSnOOPSFactorO returns the current surname "out of place segment" 
factor. 



SNTAQProcessingMode getSnTAQProcessingMode {) 

void setSnTAQProcessingMode (SNTAQProcessingMode aMode) 



Gets or sets the mode that determines how to proces asumame TAQ values 
The following modes are supported: 

Mode Description 
SN_TAQ_MODEJGNORE 



The API will not check surname segments to s 
TAQ values. 



The API will check each surname segment to s 
SN_TAQ_MODE_JUST_REMOVE TAQ value. If so, the value is removed as thou 

existed. 



SN_TAQ_MODE_IGNORE 



The API will check each surname segment to s 
TAQ value. If so, the segment gets associated 
^proper stem segment, and is used in the compu 
stem segment's score. 
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See the discussion on TAQ s for an explanation of the different types of TAQ values. . 
See the discussion on TAQ Scorin g for information on how TAQs are used to adjust 
segment scores. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



„ ^ An SNTAQProcessingMode value of either SN TAO MODE IGNORE or 
aMode SN^TACLMODE_JUST_REMOVE. 



Return Values: 



getSnTAQProcessingModeO returns the current surname TAQ 
processing mode. 



BOOL getUseGnLeftBias () 

void setUseGnLeftBias{BO0L aBool) 



Gets or sets the flag that determines i feiven name segment comparisons should be 
biased towards matches that occur at the beginning of the segment. When this feature 
is turned on, as we move to the right, matching character pairs are given decreasingly 
less credit in calculating a segment score. When this feature is turned off, all matching 
character pairs receive full credit, regardless of their position with their respective 
segment. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 



aBool A BOOL value of TRUE or FALSE. 
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Return Values: 



getUseGnLeftBiasO returns the current value of the flag (TRUE or 
FALSE). 



BOOL getUseGnVariants ( ) 

void setUseGnVariants (BOOL aBool) 



Gets or sets the flag that determines i feiven name segment comparisons should check 
to see if the two segments are linguisti( yariants of each other. 

The API maintains internal tables that describe relationships between name variants. 
Each variant relationship has an associated score and culture. When comparing two 
segments, the API examines the value of the "use given name variants" flag. If it is 
turned on, the internal variant tables are searched to see if there is a variant relationship 
between the two segments, within the culture associated with this query (as determined 
by the SNQueryParms object used to perform the comparison). There is also a generic 
set of variants that are searched independent of culture. If a variant relationship is 
found, its associated score is assigned to the segment score, and no character based 
comparison is performed. 

At present, the set of variants and their associated scores can not be modifled by the 
developer. 

These are advanced methods and should only be used by those with a deep . 
understanding of name searching issues. 



Parameters: 



aBool A BOOL value of TRUE or FALSE. 



Return Values: 
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getUseGnVariantsO returns the current value of the flag (TRUE or 
FALSE). 



BOOL getUseSnLeftBias {) 

void setUseSnLeftBias (BOOL aBool) 

. 



Gets or sets the flag that determines i feumame segment comparisons should be biased 
towards matches that occur at the beginning of the segment. When this feature is 
turned on, as we move to the right, matching character pairs are given decreasingiy 
less credit in calculating a segment score. When this feature is turned off, all matching 
character pairs receive Ml credit, regardless of their position with their respective 
segment. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

aBool A BOOL value of TRUE or FALSE. 

Return Values: 

getUseSnLeftBiasO returns the current value of the flag (TRUE or 
FALSE). 



BOOL getUseSnVariancs () 

void setUseSnVariants (BOOL aBool) 
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Gets or sets the flag that determines ifiimame segment comparisons should check to 
see if the two segments are linguisticvarianis of each other. 

The API maintains internal tables that describe relationships between name variants. 
Each variant relationship has an associated score and culture. When comparing two 
surname segments, the API examines the value of the "use surname variants" flag. If it 
is turned on, the internal variant tables are searched to see if there is a variant 
relationship between the two segments, within the culture associated with this query 
(as determined by the SNQueryParms object used to perform the comparison). There 
is also a generic set of variants that are searched independent of culuire. If a variant 
relationship is found, its associated score is assigned to the segment score, and no 
chaifa(:ter based comparison is performed. 

At present, the set of variants and their associated scores can not be modified by the 
developer. 

These are advanced methods and should only be used by those with a deep 
understanding of name searching issues. 



Parameters: 

aBooI A BOOL value of TRUE or FALSE. 

Return Values: 

getUseSnVariantsQ returns the current value of the flag (TRUE or. 
FALSE). 



SNAPI is a trademark of Language Analysis Systems. All other products mentioned are registered trademarks or 
trademarks of their respective companies. 

Questions or problems regarding this web site should be directed to >vcbmastei'@las-inc.com. 
Copyright ® 1997 Language Analysis Systems. All rights reserved. 
Last modified: Friday December 19, 1997 . 
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SNResultsList Class 

• Class OveiAMew 

• Vlethods-Summan' 

• Attributes 

• Construction 

• Method Details 



Overview 

The SNResultList class provides a mechanism to manage the results of a query. 
Specifically, an SNResultsList object handles issues of comparing and sorting evaluation 
names (SNEvaiNamePata objects) that have been determined to be matches. In addition, an 
SNResultsList object can trim the set of matching names down to the best N names, where 
N is specified by the developer. 

An SNResultsList object is equipped to manage a single query session (a set of comparisons 
between a single query name and one or more evaluation names). To use an SNResultsList 
object, the developer must create a new object via the SNResultsL istf) constructor, and 
attach it (via the ^NOtiervNameData:: setResultsListr) method) to the query name 
r SNOuervNamePata ) whose results it should manage. As calls are made to 
SNEvalNameData's pertbi-mConip Q method, the SNResultsList will manage those 
evaluation names that are considered matches (as determined by the 
s;NFvalNameData:: getComDResult () method). After all evaluation names have been 
compared to the query, the developer can interrogate the SNResultsList object to deiermine 
the number of matches, and request pointers to matches themselves (pointers to 
SNEvalNameData objects). After all matches have been processed, the developer should 
• delete the SNResultsList object. A new query should create a new SNResultsList object, 
rather than reuse an existing one. 

SNResultsList provides two important management functions. First, it sorts matching 
SNEvalNameData objects automatically. Sorting is accomplished by invoking the 
SNEvalNameData: :comGareScore() method to determine which matches are better than 
others. This provides the developer with a great deal of flexibility, because the 
compareScoreO method can be overridden to allow for customized sorting behavior. The 
default method provides a robust set of comparison criteria, but the developer can alter the 
functionality, or incorporate new application specific data into the comparison. 

The second important management ftinction is that of results trimming. When constructing 
an SNResultsList object, the^ developer can specify the maximum number of matches the 
results list should hold. As matches are added to the results list, only the requested number 
of matches are retained. The object ensures that the 6est matches (as determined by 
SNEvalNameData::compareScoreO) remain in the results list, and it also handles the 
memory management associated with discarding those matches that are "squeezed out" by 
better matches. 
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Most applications will benefit from the functional it>' of SNResultsList. However, use ot an 
SNResultsList object is optional. If desired, the developer can provide tor their own match 
manaeement. For example, an application may choose to examine the return code trom 
SNEvalNameData's performCompO method directly, rather than dependmg on an 
SNResultsList object for sorting and filtering. ^ 



Methods Summary 



Common Methods: 



SNResultsList O Constructor for the class. 

addHit O Adds a name object to the results list. Called by the APL not the developer 
getHitAtO 

getNumHits O Returns the number of name objects in the results list 



Renims the name object at the specified index. Used to retrieve matches at the 
end of a query session. 



oetStatusO Returns the status of the results list object. Used for error checking and 

reponing. 



Attributes: 



All attributes within the SNResultsList class are protected, and not available to the 
developer. 



Method Details: 

Constructors: 



SNResultsList ( int maxHits) ; 



Constructs a new results list. If maxHits is specified as a number, the list will contain up to 
maxHits matches at anv given time. Altemativelv. maxHits can be specified as the special 
constant SN RESULTS LIST SIZE EXPANDABLE , in which case the results list can 
grow to any size (within available memory limitations). 

On successful construction, the new results list is empty, and the status of the object is set to 
SN SUCCESS. Use the getStatus() member function to validate successful construction. 



Parameters: 
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maxHits The maximum number of matches the results list can hold at any one tmie. A 

special value of SN^RESULTS_LIST_SIZE_EXPANDABLE indicates that the 
list should grow as needed. This parameter must be a number greater than I, or 
the special constant SN_RESULTS^LIST_SIZE_EXPANDABLE. 



Return Values: 

None. However, the getStatusQ member should be called to validate successful' 

construction. 

Memorjj^Management: 

The responsibility of deleting an SNResultsList object lies with the developer. 
In general, an SNResultsList^objeci should be deleted at the end of the query 
session, after all results have been retrieved and processed. 

Because SNResultsList makes copies of the SNEvalNamePata objects it 
manages, the developer may delete the SNEvalNameData objects immediately 
after calling SNEvalNameData' s performComp Q method. 

Examples: 

The example below shows the construction of an SNResultsList object, 
and its use in a query session: 



SNEvalNameData *candidatel ; 
SNEvalNameData *candidate2; 
SNQueryNameData *queryName; 

SNQueryParms 'cueryParms = new SNQueryParms (SN_PARMS_GENERIC) ; 
SNReturnCode retCode; 
SNResultsList 'mvResuitsList = NULL; 

cancidatel = new SNEvalNameData '.queryParms, "Bob Earl", "Jones"); 
canGidate2 = new SNEvalNameData (queryParms, "Earl", "Jhonas"); 
queryName « new SNQueryNameData (queryParms , "James Earl", "Jones"); 



queryName->setResultsList {myResuitsList ) ; 

candidatel->performComp (queryName) ; 
candidate2->performComp (queryName) ; 

// evai names can be deleted after being compared to the query 
delete candidatel; 
delete cancidate2; 

it (myResultsList-xgetNumHits ( ) > 0) ( 

SNEvalNameData -matchName = myResuitsList->getHitAc (0) ; 
crintf("best match was %s, %s\n", matchName->getSn () , matchMame->getGn ( 

} 

else 

printf ( "Neither name Matched"); 

delete myResultsList ; 
delete queryName; 
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ISNReiumCode addHitfSNEvalNamePata '^aHit)) 

Adds an evaluation name object ( SNEvalNamePatiO to the results list. This method is 
invoked by the API, and not by the developer. Specifically, it is called during 
SNEvalNameDaia's perfonnComp Q method, when an evaluation name is determined to be a 
match. This method makes a copy of the name object, and assumes responsibiiiiy tor its 
deletion. 

~"T*aramcters: 

aHit A pointer to the SNEvalNameDaia object that should be added to the results list. 



Return Values: 

An SNReaimCode value indicating the success or failure of the operation: 

SN SUCCESS: The operation was successful. 

SN_RESULTS_LIST_INSERT_ALLOC_FAlLURE : A memory allocation problem 

occurred. 

SN_RESULTS_ARRAY_NULL_ERROR: A memory allocation problem 

occurred. 



ISNEvalNamePata * getHitAt (ini: anlndex)] 

Returns a pointer to the SNEvalNamePata object at the specified index. The index is 0 
based, so getHitAt(O) returns a pointer to the best match. 

If anlndex specifies an index thai is out of range, the function returns NULL. Applications 
generally first call uetNumHits O to determine the valid range of index values that can be 
supplied to this method. 

The SNResultsList object owns the objects it maintains. As evaluation names are added to 
the results list, trimming might occur, which can result in the deletion of SNEvalNamePata 
objects that get "squeezed out" to make room for better matches. As a result, an application 
should not rely on the validity of a pointer obtained by getHitAt() after a subsequent call to 
SNEvalNamePata:: performComp O. Similarly, because SNResultsList's destructor deletes 
the objects it manages, all pointers obtained by calls to getHiiAl() become invalid once the 
results list is deleted. 

Parameters: 

anlndex The 0 based index of the desired evaluation name object. 



Return Values: 
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A pointer to the SNEvalNameDaia object at the specified index. If the specified 
index is out of range, the function returns NULL. 



Examples: 



The examole beiow shows a sample query session using an SNResultsLisc 
object:. Notice that we caii gecHitAtO to retrieve Che best match: . 

■ 

SNE-valNameDaca •candidatei ; 
SNEvalNameData ♦cancliciate2 ; 
SNQueryNameData ♦queryName; 

SNQueryParms *queryParms = new SNQueryParms (SN_PARMS_G£N£RIC) ; 
SNReturnCode retCode; 
SNResultsList 'myResultsList = NULL; 

candidatel = new SNEvalNameData (queryParms , "Bob Earl", "Jones"); 
candidate2 = new SNEvalNameData (queryParms , "Earl", "Jhonas") ; 
queryName » new SNQueryNameData (queryParms, "James Earl", "Jones"); 

myResultsList - new SNResultsList t 1 ) ; // create a manager for just 1 mat 
queryName->setResultsList (myResultsList) ; 

candidatel ->performComp (queryName) ; 
candidate2->perf ormComp (queryName) ; 

delete candidatel; 
delete candidate2; 

if (myResultsList->getNumHits 0 > O) " { 

printfC'best match was %s, %s\n", matchName->qetSn () , matchName->getGn { 

} 

else 

print f ("Neither name Matched"); 

delete myResultsList; 
delete queryName; 



in- getNumHits ( ) 



Returns the number of matches in the results list. This is NOT the number of matches the 
list is capable of holding, but is the number of matches available for the user to retrieve via 
the geiHiiAtO method. 

Parameters: 

None. 

Return Values: / 

The number of matches in the results list. 
Examples: 
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The example below shows a sar.pie query session using an SNResultsLisc 
objecc. Notice that we call getNumHits ( ) to niake siire we have a hit 

ZG reprieve: 



SNEvalNameData *candidate 1 ; 
SNEvalNameData *candiciate2 ; 
SNQuervNameData •cueryName; 

SNQueryParms *queryParms - new SNQueryParms (SN_PARMS_GEN£RIC) ; 
SNReturnCode retCode; 
SNResultsList *myResultsList = NULL; 

candidatel = new SNEvalNameData { queryParms , "Bob Earl", "Jones"); 
cap^didate2 = new SNEvalNameData (queryParms, "Earl", "Jhonas"); 
que^fyName = new SNQueryNameData (queryParms, "James Earl", "Jones"); 

myResultsList = new SNResultsList ( 1 ) ; // create a manager for just 1 mat 
queryName->setResultsList (myResuitsList) ; 

candidatel->performComp (queryName) ; 
candidate2->perf ormComp (queryName) ; 

delete candidatel; 
delete candidate2; 



SNEvalNameData *matchName « myResuitsList->getHitAt (0) ; 

printf{"best match was %s, «s\n", matchName->getSn ( ) , matchName->getGn ( 

} 

else 

printf ("Neither name Matched"); 

delete myResultsList; 
delete queryName; 



SNReturnCode .getStatus u] 



Returns the status of the results list object. This method is for error checking puiposes, and 
is usually called after an attempt to construct an SNResultsList object. 

Parameters: 

None. 

Return Values: 

An SNReturnCode value indicating the status of the object: 

SN_SUCCESS: Construction was successful. 

SN_RESULTS_LIST_ALLOCATION,ERROR: A memory allocation problem occurre 

SNJNVALID RESULTS_LIST_SIZE: , An invalid results list size was specifi 

" results list size speciller must be a nu 

greater than 1 , or the special constant 
SN RESULTS LIS T SIZE F.XPAN 
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Examples: 

The example below shews a ^an-ipie query session using an SNRasuitsLisc 
. object. Notice that we call getStatusO to make sure our results list 
was created properly: 



SMEvalNameData •candidate 1; 

SNEvalNameData *canGidate2; 

SNQueryNameData *queryMame; 

SNQueryParms -queryParms = new SNQueryParms (SN_PARMS_GENERIC) ; 

SNReii'urnCode retCode; 

SNResultsList *myResultsList = NULL; 

' myResultsList « new SNResultsList ( 1) ; // create a manager for just 1 mat 

candidate! new SNEvalNameData (queryParms, "Bob Earl", "Jones".); 
candidate2 « new SNEvalNameData (queryParms, "Earl", "Jhonas") ; 
queryName = new SNQueryNameData (queryParms, "James Earl", "Jones"); 

queryName->setResultsList (myResultsList)' ; 

candidatel->"performComp (queryName) ; 
candidate2->performComp (queryName) ; 

delete candidatel; 
delete candidate2; 

if (myResultsList->getNumHits () > 0) { 

SNEvalNameData •matchName = myResuicsLisc->getHicAt (G ) ; 
printf("best match was *s, %s\n", matchName->getSn ( ) , matchName->get 

\ 

else 

printf ("Neither name Matched"); 
delete queryName; 

I 

else 

printf ( "Error creating results listSn";; 
delete myResultsList; 



SNAPl is a trademark of Language Analysis Systems. All other products mentioned are registered trademarks or 
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SNAPI Functions Documentation 

The-SNAPI API provides a small number of ftinciions thai the developer may find useful. 

SN aet error text O Returns the message text for. a specified SNAPI return code. 

SN shutdown O Releases global resources allocated by the SNAPI system. 

SN stanup Q Forces allocation of SNAPI global resources. If this function is not called, the 

resources will be allocated during the first query. 

SN strip Q A utility function to remove leading and trailing white space from a NULL 

terminated string. 

SN sirrchr O A utility function that searches backwards in a string for a specified character. 

Differs from sirrchr() in that the string does not have to be ^5ULL terminated. 



I ||void SN_get_error_text (SNReturnCcde errcrCade, char *textBuf£er, inz maxChars)! 



Retrieves the message text associated with a" SNAPI return code. See the associated 
documentation on SNReturnCode for a list of possible error codes. 



Parameters: 

errorCode The SNRetumCodc value for which text is to be retrieved, 

text Buffer A buffer to hold the message text. 

maxChars The size of textBuffer (minus I for the NULL terminator). 

Return Values: 

None. On return, textBuffer contains the message text. 
Examples: ' 

The example beiow shows a failed attemp- zc create an SHResui-zsLis::, 
and a subsequent caii to SN_get_<irror_te/.t ( ) : 
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SNResultsLisc 
SNReturnCocie 



-myResultsLis- « NULL; 
retCode ; 



myResuitsLi'st « hew SNResultsLisc (-4 ) ; // <-- invalid call 
retCcde = myResult sLisT:->get3tacus ( ) ; 

if {reTiCode != SM_SUCCESS) { 
char T.scSuf[lCOO IJ^- 
print f ( "Problem creating results list, msg was: %s\n", mscBuf } ; 

.} 

else 

^p.rintf ( "Results lisu creaLed OK."); 
delete myResultsList; 



Releases global resources that have been allocated by the SNAPI API. 

The SNAPI API uses several lookup tables and similar resources. When the application 
exits, the operating system releases these resources, as it does with all resources associated 
with the prpcess. However, many debugging environments check for memory, leaks just 
before an application exits. To prevent debugging messages of this nature, call the 
SN_shuidownO function just before your application exits. 



Parameters: 

None. 



Return Values: 
None. 



| [vQiQ SM startup ( )1 



Allocaies the global resources required by the SNAPI API. 

The SNAPI API uses several lookup tables and similar resources. Each time one of these 
resources is needed, a check is made to see if the resource has been created. If not. the 
resource is created at that time. This may result in a slight delay the first time'a resource is ' 
required. 

By calling the SN_stanup() function, all global resource allocation can be controlled by the 




[void SN shutdown 0 



of4 
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developer. . 

Parameters: 
None. 

- -Return Values: 

Nong..*. 



I llvoid SN ST:rip(char *aStrinq)| 

Alters the supplied string by removing leading and trailing whitespace. 

Parameters: 

astring A NULL terminated string. 

Return Values: 

None. On return, aString has been stripped. 



I lchar ' SN scrrchr(char 'SuringStart , char 'searchPos, char se5rchChar)| . 

Searches backwards through a string, looking for the specified search character. This 
function is similar to strrchrQ, except that we also specify the position in the string to stan 
searching, rather than assuming the string is NULL terminated and starting at the end. 

If the search character is not found (we reach stringSiart without finding the search 
character), we return NULL. Otherwise, we return a pointer to the first occurrence of search 
character we come across. 

Parameters: 
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A pointer to the stan of the siring to be searched. 

A pointer to the character to begin the reverse search. This should point 
to a character to the right of the stringStart. 

The character we are trying to find. 
Return Values: 

A pointer to the first occurrence of searchChar we find in our reverse search, or 
NULL if we reach stringStart without finding searchChar, 



scringScarz 

searchPos 

searchChar 
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SNAPI Data Types and Enumerations Documentation 

The SNAPI API defines a number of data types and enumerations that are used throughout the system: 
Data Types 

BOOL A Boolean data type. The variable may .take the pre-defmed values TRUE or FALSE. This 
data type is provided to appease C-I-+ compilers that do not yet suppon the standard bool 
data type. 

word A short int. 



Enumerated Types 
SNNameFormat 



Indicates the format of a single string name when constructing a name. 
See the SNEvalNameDaia::SNEva[NameData() and 
SNOuervNameData::SNOuervNameDataO constructors for details on 
how these values are interpreted. Possible values are listed below. 



SN.SURN AME.COMMA.GIVENNAME T^^^ ^^^^^^^^ ^vln'naSe" 



SN LAST SEG IS SURNAME 



SN NAME FORMAT_UNKNOWN 



The last (right most) 
segment is the surname. 
TAQ values are removed 
from considieration when 
determiiiing 'the last 
segment. 

The name formal is 
unioiown. 



I ofg 
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SNParmsType 



Specifies a particular parameter type when constructini^ a new 
?^NOuer\'Pamis object. See the 5;NOuervPa rms::SNOiiervParms('^ 
constructor for additional details. Possible values are listed below. 



SN FARMS GENERIC 



SN. 


FARMS, 


.AR-\BIC 


SN 


FARMS. 


CHINESE 


SN. 


_PARMS, 


^HISPANIC 


SN 


_FARMS_ 


_KOREAN 


SN 


FARMS. 


RUSSIAN 



Specifies a set of parameters appropriate for 
searching Anglo (English ) names, or names 
of unknown or mixed ethnicity. 

Specifies a set of parameters appropriate for 
searching Arabic names. 

Specifies a set of parameters appropriate for 
searching Chinese names. 

Specifies a set of parameters appropriate for 

searching Hispanic names. 

Specifies a set of parameters appropriate for 
searching Korean names. 

Specifies a set of parameters appropriate for 
searching Russian names. 



SNSegScoreMode 



Specifies the segment score mode when adjusting an SNOuerv Farms 
object via the selGnSeumentScoreMode( ) or setSnSeiimentScoreModeC ) 
methods. See either of these methods for additional details. Possible 
values are listed below. 



SN SEGMODE HIGHEST 



SN SEGMODE LOWEST 



SN SEGMODE AVG 



The score assigned to the name field is 
the highest score found when comparing 

the segments. 

The score assigned to the name field is 
the best low score found when 
comparing the segments. 

The score assigned to the name field is 
the best average score found when 
comparing the segments. 



:ofS 
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L.^?.TTnPrn^^^ See either of these methods tor addmonal 

details. Possible values are listed below. 



SN.TACLMODEJGNORE 



The API will not perform any 
TaQ processing on the name 
field. 



The API will remove any TAQ 
SN TAO MODE JUST REMOVE values from the name field, but 
^ - - ' will not adjust any scores. 

The API will remove any TAQ 
values from the name field, and 
will segment scores accordingly. 



SN_TAQ_MODEJGNORE 



SNAnchorSegMode 



Specifies the anchor segment when adjusting an SNOuervParms object 
via the .^^tGnAiichorSegmentModer ) or setSnAnciiorSeL;mentMode() 
methods. See either of these methods for additional details. Possible 
values are listed below. 



No segment carries more importance 
than another. Name segments are lined 



SN^ANCHOR_SEG_NONE j^ft to deten^ine which 

segment comparisons are in place. 



SN_ANCHOR_SEG_FlRST 



SN>NCHOR_SEG_LAST 



The first segment is the most important 
seament. Name segments are lined up 
on^the left to determine which segment 
comparisons are in place. 

The last (right most) segment is the 
most important segment. Name 
seaments are lined up on the right to 
determine which segment comparisons 
are in place. 



SNReturnCode A set of return codes. Each return code specifies a particular condition 

SNReturnCode ^^^^ ftinctions within the API return a variable of type 

SNReturnCode. The global function S;N »ei error textf) can be used to 
retrieve a textual description of the code. Each code's meaning is also 
documented below. 

Operation 



The 

comparison 

SN jMATCH resulted in a 

match. 
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SN NO MATCH 



SN_INVALID,SCORE,THRESH 



SNJNVALID_GN_INIT^SCORE 



SN INVALID SNJNIT^SCORE 



SN_INVALID_GN^INIT_ON_INIT_MATCHJ 



The 

comparison 
did not result 
in a match. 

Bad score 
threshold 
value. 

Bad given 
name Initial 
score. 

Bad surname 
Initial score. 

Bad given 
name *' Exact 
SCORE Initial 
match" 
score, . . 



SN_INVALID_SN_INIT_ON_INIT_MATCH_SCORE 



SN INVALID NFN SCORE 



SN INVALID FNU SCORE 



SN INVALID NLN SCORE 



SN INVALID LNU SCORE 



SN_INVALID_GN_ANCHOR_F ACTOR 



SNJNVALID_SN^ANCHOR_F ACTOR 



SNJNVALID_GN.OOP§_F ACTOR 



Bad surname 
"Exact Initial 
match" 
score. 

Bad "No 
First Name" 
score. 

Bad "First 
Name 
Unknown" 
score. 

Bad "No 
Last Name" 
score. 

Bad "Last 
Kame 
Unknown" 
score. 

Bad given 
name anchor 
factor. 

Bad surname 

anchor 

factor. 

Bad given 
name "Out 
of Place 
Segment" 
factor. 
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SN_INVALID_SN_OOPS_F ACTOR 



SN_INVALID^ABS_DEL^GN^TAQ.F ACTOR 



SN INVALID^BS^DEL.SN^TAQ^F ACTOR 



SN_INVALID_ABS_DIS_GN_TAQ_F ACTOR 



SN^INVALID_ABS_DIS^SN_TAQ_F ACTOR 



SN_INVALID_DEL_GN_TAQ_F ACTOR 



SN^INVALID.DEL^SN^TAQ^F ACTOR 



SNJNVALID_DIS,GN_TAQ_F ACTOR 



SN_INVALID_DIS_,SN_TAQ_F ACTOR 



SNJNVALID_GN_COMPRESSED_NAME_SCORE 



SN_lNVALID_SN^COMPR£SSED^NAME_SCORE 



sn_results_listJksert_alloc_failure 



Bad surname 
"Out of 
Ptace 
Segment'*" 
factor. 

Bad given 
name "absent 
delete TAQ" 
factor. 

Bad surname 
"absent 
delete TAQ" 
factor. 

Bad given . 
name "absent 
disregard 
TAQ" factor. 

Bad surname 
"absent 
disregard 
TAQ" factor. 

Bad given 
name "delete 
TAQ" factor. 

Bad surname 
"delete 
TAQ" .factor. 

Bad given 

name 
"disregard 
TAQ" factor. 

Bad surname 
"disregard 
TAQ" factor. 

Bad given 
name 

"compressed 
name" score. 

Bad surname 
"compressed 
name" score. 

Could not 
alloc space 
for new hit 
in the results 
list. 
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SN GN VAR TABLE_CREATION^ERROR 



SN_SN_VAR_TABLE_CREATION_ERROR 



SN_TACLTABLE_CR£ATION_ERROR 



SN,SEG3R£AIC_CHARS_CREATI0N_ERR0R 



SN_NOISE_CHARS^CREATION_ERROR 



SN.INVALID_RESULTS_LIST_SIZE 



SN,RESULTS_LIST_ALLOCATION_ERROR 



SN_RESULTS_ARRAY^NULL_ERROR 



SN_TAQ_RECORD_ALLOC_ERROR 



SN VARIANT ALLOC ERROR 



SN VARLANTS DONT EXIST 



Could not 
create GN • 
variant table. 

Could not 
create SN 
variant table. 

Could not 
create TAQ 
table. 

Could not 
create seg 
break chars 
string. 

Could not 
create noise 
chars string. 

Invalid size 
requested for 
results list. 

Could not 
allocate 
initial space 
for results 
list. 

The internal 
results list 
array is 
NULL. 

Problem 
allocating 
space for a 
new TAQ 
record. 

Problem 
allocating 
space for a 
new variant 
record. 

An attempt 
was made to 
alter the 
score of a 
relationship 
that did not 
already exist. 
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SN_rNVALID_VARIANT^SCORE 



SN TOO MANY VARIANTS FOR NAME 



An invalid 
score was 
specified for 
a variant 
relationship. 

The 

maximum 
number of 
variants per 
name has 
been 

exceeded for 
a name. 



SN VARIANT ALREADY RELATED 



SN FARMS FILE OPEN ERROR 



SN FARMS FILE NOISE CHARS ERROR 



SN FARMS FILE BREAKS CHARS ERROR 



SN^TAQ,NOT_FOUND 



SN_TAQ_ALREADY_EXISTS 



SN INVALID GN THRESH 



An anempt 
was made to 
relate two 
names that 
were already 
related. 

Problem 
opening a 
parms file 

Problem 
reading noise 
chars from a 
parameters 
file. 

Problem 
reading 
break chars 
from 

parameters 
file. 

The 

specified 
TAQ could 
not be found. 

The 

specified 
TAQ is 
already 
defined. 

The 

specified GN 
Thresh is 
Invalid. 
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SN INVALID_SN_THR£SH 



SN_INVALID_GN_WEIGHT 



SN INVALID_SN_WEIGHT 



SNJNVALID_CULTURE.CODE 



SN ERROR^READING CUSTOM, 
PARAMETER^FROMJILE 



SN ERROR_WRITING_CUSTOM 
PARAMETER.TO^FILE 



The 

specified SN 
Thresh is 
invalid. 

The 

specified GN 
Weight is 
invalid. 

The 

specified SN 
Weight is 
invalid. 

The 

specified 
Culture Code 
is invalid. 

.An error 

occurred 

while 

reading a 

custom 

parameter 

from a file. 

An error 
occurred 
while writing 
a custom 
parameter to 
a file. 
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SNAPI Constants Documentation 

The SNAPI API defines a number of constants that are used throughout the system: 



Name 

EOS 
FALSE 

SN_DEFAULT_NOISE_CHARS 



SN_DEFAULT.WHITESPACE 
SN_MAX_GN_LEN 

SN_MAX.MN_LEN 

SN_MAX_SEG_LENGTH 

SN_MAX_SEGS_AFTER_TAQ 



Value Description 

^0* The end of string marker. 

0 Boolean constant 

The default set of 

"w-aw n*+ h'<->ofn) characters that are 
r\vv1fri2^^^^^^ treated as though they 

in a name. 



SN_DEFAULT_,SEG_DELIM_CHARS \r 



" \n\r\t" 
255 

255 

30 



The default set of 
characters that act as 
segment delimiters. 

The set of characters 
considered to be 
whitespace. 

Max length of the given 
name. If a given name is 
longer, truncation 
occurs. 

Max length of the middle 
name.- If a middle name 
is longer, truncation 
occurs. 

Max length of a single 
name segment. If a 
segment is longer, 
truncation occurs. 

Max number of segments 
per name field after TAQ 
removal. Any segments 
after the maximum are • 
disregarded. 
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SN_MAX_SEGS^BEFORE_TAQ 



SN.MAX^SN^LEN 



10 



255 



SN^MAX.T AQS^PER.SEGMEIST 5 

SN_RESULTS_LIST_SIZE_EXPA]>a)ABLE -1 
TRUE ^ 



Max number of segments 
per name field before 
TAQ removal. Any 
segments after the 
maximum are 
disregarded. 

Max length of the 
surname. If a surname is 
longer, truncation 
occurs. 

Max number of TAQs 
that can be associated 
with one segment. Any 
TAQs over the 
maximum are 
disregarded. 

Specifies that the results 
list should grow as 
needed. 

Boolean constant. 
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SNAPI Vocabulary and Terms 

.Inialking about names and name searching, a definition of some vocabulary can be helpful. Below, we 
present a brief discussion of some of the terms and concepts found throughout the documentation: 
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Name Description 



Anchor The name setiment within a name field that is considered the most important. Most 
Segment cultures do not have an anchor segment, because all segments are considered equally 
important. However, some cultures place emphasis on a particular segment. For 
example, the first segment of an Arabic given name (e.g., Mohammaed bin Salam) is 
considered most important and the second segment of a Lusophone or Portuguese 
surname (e.g., Ferreira Dos Santos ^ is considered most important. 

Given The portion of a name that does NOT reference the family name. In Anglo (English) 

. Name names, this is typically the first name. 

Name Field The given name or surname portion of a name, considered separately. The SNAPI 
name model currently uises just these two name fields. 

Name Any portion of a name that is separated by a segment delimiter . The most common 

Segment segment delimiter is a space. For example, the name "James Earl Jones" has three 
segments: "James", "Earl", and "Jones". 

Name Linguistic variants include a wide variety of motivated variations of a name. 

Variant including nicknames, abbreviations, phonetic variants, and cultural variants, among 

others. Common Anglo (English) nicknames include Jack/John and Bill/William. 

The SNAPI system currently uses a table of name variants organized by culture to 

provide special handling for these names. 

Stem A stem is a non-TAQ segment. Stem segments are considered to be part of the actual 

name, while TAQ values are adjuncts to the name. Stem segments receive segment 
scores, while TAQ segments do not (rather, they are used to adjust their associated 

stem's segment score). 

Surname The portion of a name that describe a person's family (i.e., family name). In Anglo 
(English) names, this is typically the last name. 

TAQ An acronym that stands for Titles, Affixes (prefixes and suffixes) and Qualifiers. 

TAQ values can be thought of as name modifiers. Common Anglo (English) 
examples include "Jr" and "Dr". The SNAPI system uses a table of TAQ values 
organized by culture to provide special handling for these modifiers. 

. TAQ values are broken up into two groups. Deleie TAQ values are modifiers that do 
not provide any true meaning regarding a person's identity. Examples of Delete 
TAQs include "Mr" and "Dr". Disregard TAQ values do provide extra information. 
Example of Disregard TAQs include "Jr'* and "De". 

Each TAQ value is also classified as either a prefix or a suffix. This classification is 
used to determine the TAQ's associated stem segment. 
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TAQ Scoring 



TAQ processing is one of the most complicated aspects of the API. This section provides a 
condensed overview of how TAQ values are processed and used to adjust segment scores. 
In genS-al, most applications do not need to be concerned with these advanced issues. 

TAQ factors help address issues that arise when comparing names such as "James Browti 
Jr" and "James Brown Sr". The API has the ability to recognize TAQ values such as "Jr", 
"Sr", and "Dr". The API maintains an internal table of TAQ values, organized by culture 
(there is also a set of generic TAQ values that span all cultures). As a name is processed, 
each name segment is examined to determine if it is a TAQ value. The culture associated 
with the SNQueryParms object used to create the name object determines which culture's 
TAQ values should be considered (in addition to the generic set). Note that TAQ values are 
resnicted to single segments. 

Each TAQ value is classified as either a prefix or suffix in order to determine the stem 
segment with which a particular TAQ value should be associated. Once associated with a 
stem segment, the TAQ value is removed from the name (no segment score is generated for 
the TAQ value). However, after each remaining stem segment has received a score, its 
associated TAQ(s) are examined and scores are adjusted according to specific rules. 

TAQ scoring occurs after each segment has received a segment score. The process of 
adjusting segment scores based on associated TAQ values is quite complicated for two 
reasons. First, each segment can have multiple associated TAQ values. Second, the 
associated TAQ values can be of mixed type (Disregard and/or Delete). The following table 
attempts to describe the algorithm employed when scoring associated TAQ values for two 
segments: 



Step|DescriptioD 




jfTf neither segment has an associated disregard value, proceed to step 5. 




|If the same disregard value is associated with both segments, proceed to step 5. 


3 


y If both names have at least one disregard value, but none are common to both, apply 
||the "disregard TAQ" factor and stop. 1 


4 


|If one name has one or more disregard values, but the other has none, apply the 
||"absent disregard TAQ" factor and stop. 


5 


If neither segment has an associated delete value, stop (do not modify the segment 

fscore). 


6 


Ilf the same delete value is associated with both segments, stop (do not modify the 1 
jsegment score). 


3- 


jlf both names have at least one delete value, but none are common to both, apply the 
["delete TAQ" factor and stop. 


4 


Illf one name has one or more delete values, but the other has none, apply the "absent 
Ideletc TAQ" factor and stop. 



I 
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Programmer's Tutorial 

Welcome to the programmer's tutorial. This is a good place to start if you have never worked with the 
API. We will present some very simple code examples here, and explain how they work. From here, you 
can check out the API Documentation or the FAQ for greater detail and discussions of more advanced 
topics. 



Getting Started 

To write a program using the API, you will need the following: 

• A C-H- compiler. 

• The SNAPI header files (distributed with SNAPI). 

• A library file appropriate for your compilation environment (distributed with SNAPI). 

The header files are necessary so that your compiler is aware of the objects and functions that SNAPI 
provides. The library is necessary so that your linker can resolve the external references to the SNAPI 
objects and functions. In addition, you must instruct your compiler and linker as to the location of these 
files. This process is specific to each development environment. Please consult your compiler 
documentation for instructions. 



A Simple Application 

The following is a simple application that takes two names (specified on the conunand line), and 
performs a comparison between the two. We will assume the names are in "surname, given name" 
format. Following the source code, each line that uses something from SNAPI is explained in detail. 
Note that the line numbers at tfie beginning of each line are for illustration only. 



1 # include <snapi.h> 

2 ftinclude <stdio.h> 

3 #include <stdlib.h> 

4 int main(int argc, char *argv[l) 

5 { 

6 char *namel = argv(l]; 

7 char ' *name2 •= argv[21; 

8 SNQueryParms queryParms (SN_PARMS_GENERIC) ; 

9 SNQueryNameData queryNameObject *» new SNQueryNameData (iqueryParms, n 

10 SNEvalNameData evalNameObject « new SNEvalNameData (fiqueryParms, nam 

11 SNReturnCode retCode; 

12 retCode « evalNameObject->performComp(queryNameObject) ; 

13 if ({retCode SN^MATCH) II (retCode -» SN_NO_MATCH) { . 

14 if (retCode — SN_MATCH) 

15 printf ("Names Matched\n") ; 



I of 3 



1/23/98 4:26 PN 



SNAPI Programmer'sTutorial 



htip:-'>'panther.las-inc.com/produci/tutorial.hni 



16 
17 
18 



19 
20 
21 



else 

print f ( "Names Did Noc MatchXn");- 
princf ( "Name Score was %f, GN Score was 
evaiNameObject->getNameScore ( ) , 
evaiNameObject:->getSnScore ( ) ) ; 



) 

else 



1 

char 



errorBuf fer [1000 + 1] ; 



%f, SN score was %f\n", 
evaiNameObject->getGnScore ( ) , 



22 SN_get_error_text (retCode, errorBuf fer, 1000); 

23 princfC'An error occurredXn" ) ; 

.24 printf { "Error text is %s\n", errorBuf fer) ; 

^25 } 
"26 delete queryNameObject ; 

27 "'-^delete evalNameObject; 

28 exit(O); 

29 ). 

30 // end of program 



Line 1 includes the snapi.h header file. This file should be included in any source file that references 
SNAPI objects, functions or data types. 

Lines 6 and 7 assign pointers to the names specified on the command line. This is done for clarity. 
Remember that this program obtains the names it will compare via command line arguments. 

Line 8 creates a new SNOuervParms object. This object encapsulates all the parameters that control how 
names are processed, and how comparisons between names are performed. An SNQueryParms object is 
created by specifying an SNParmsTvpe value, which identifies a particular culture. The resulting object 
contains parameters appropriate for the specified culture. Our application requests parameters for a 
generic search. A typical application might adjust a small number of these parameters. For simplicity, 
this application uses the default parameters. The application is responsible for deleting any 
SNQueryParms objects it creates. Because we have created our SNQueryParms object on the stack, it 
will automatically be deleted when the mainQ function exits. 

Line 9 creates a new SNQuervNamePata object. When creating the object, we must specify an 
SNQueryParms, a name string, and an SNNameFormat variable that tells the constructor how to 
interpret the name. Again, for our example, we are assuming a format of "surname, given name'*. 
SNQueryNameData also includes other constructors (e.g. one that retreives the given name and 
surname as separate variables). The SNQueryParms object tells the API how to process certain aspects 
of the name (e.g. name variants and TAP values). 

Line 10 creates a new SNEvalNamePata object. This object is very similar to an SNQueryNameData 
object, but with a few important differences. First, an SNEvalNameData object includes a method to 
compare itself to an SNQueryNameData object. In addition, it defines score attributes to hold the 
results of such a comparison. Most true search applications will create one SNQueryNameData object 
(perhaps for a name keyed in by the user), and many SNEvalNameData objects (one for each name in a 
database to be searched). Our simple application only compares two names,.so the distinction between 
the query name and evaluation name name is not as dramatic here. The SNQueryParms object that is 
used to create this object should be the same SNQueryParms object that was used to create the 
SNQueryNameData object above. In general, when comparing two name objects, both objects should 
be created using the same SNQueryParms object. 

Line 1 1 defines an SNReturnCode variable. Many SNAPI functions return a value of this type. 

Line 12 performs the actual comparison between the query name and the evaluation name. Note that we 
pass the query name as a parameter to the evaluation name's performCompQ method (only 
SNEvalNameData defines a comparison fiinction. SNQueryNameData does not). The 
performCompO method conducts a comparison according to the parameters specified in the 
SNQueryParms object use to create the evaluation name. It returns a value indicating if the names 
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matched (SN MATCH), did not match (SN NO.MATCH) or if there was some sort of error (various 
other return codes). After the performCompO method returns, the SNEvalNameData objects score 
attributes are set. These attributes include the given name score, surname score, and overall name score. 

Line 1 3 ensures that there were no problems in performing the compairison. 

Lines 14 through 17 examine the SNReturnCode value and print out a message indicating whether the 
name is considered a match or not. 

Line 18 prints out the scores that were computed during the performCompO method. It calls the 
ftinctions petNameScore(\ getGnScoref^ and getSnScorefV Each of these functions returns a value type 
double between 0.0 and 1.0 inclusive, where 1.0 represents an exact match. 

Lines 20 through 25 handle the case where the comparison function produced some kind of error (an 
SNReturnCode value other than SN_MATCH or SN.NO.MATCH). Line 22 calls the ftinction 
5;n pet error text(\ which retrieves the error text associated vwth a particular SNReturnCode value. 
We pass in a buffer to hold the enror text, along with the size of our buffer. 

Lines 26 and 27 delete the SNQueryNameData and SNEvalNameData objects that we created at the 
beginning of the program. Any SNAPI objects that are created by the developer must be deleted. 



Another Example 



SNAPI is a trademark of Language Analysis Systems. All other products mentioned are registered trademarks or 
trademarks of their respective companies. 

Questions or problems regarding this web site should be directed to webmaster(g)las-inc.com. 
Copyright © 1997 Language Analysis Systems. All righte reserved. 
Last modified: Friday January 23, 1998. 
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FAQ - Frequently Asked Questions 

This page contains answers to common questions. The FAQ is broken into two sections: A general FAQ. 
"Ibeveloper's FAQ. 

General Questions: 

1. What is SNAPI? 

2. What platforms does SNAPI support? 

3. How much does it cost? 

4. What kind of support/training options are available? 

5. What kind of documentation is provided? 

6. What is the current production release number for SNAPI? 

7. Can SNAPI interface with mv database (e.g. Oracle. Sybase. etcX 

8. What type of indexes does SNAPI support? 

9. What performance benchmarks are available? 

10. What are the machine resource requirements (Memory, disk spaced for SNAPI? 



Developer Questions: 

1. Which header files should I include in mv programs? 

2. Does the API support mutelv-threading? 

3. How can I only look at surnames when doing a name comparison? 

4. I have a middle name field in mv database. How do I include this information in mv name 

comparisons? 

5. Mv application is written in (COBOL. FORTRAN. Java). How can I use SNAPI to add name 
checking to mv app? 

6. How do I associate other data (such as database record ids) with the name objects SNAPI uses? I 
need this so that when SNAPI tells me a name matches mv query. I can look up other information 
associated with that name. 

7. I have other information (such as age and social security number) that I want to include in the 
comparison process. How do I do this? 

8. Which C++ compilers does SNAPI support? 

9. Mv database contains names of mixed culture. Which culture should I specify when 1 create mv 

parameters obiect? 

10. Mv database has over 10 million names. Can SNAPI handle this kind of volume? 

1 1. I am planning to create name objects for every name in mv database, and hold them in memory, 
reusing them for each query. How much memory will I need to do this? 

12. How can I change the scores assigned to a particular name variant association. How can I delete 
variant associations or add new ones? 

13. How can 1 delete TAO values or add new ones? . 

14. What types of name formats does SNAPI support? 



1. What is SNAPI? 
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SNAPI is an application program interface that lets a developer add sophisticated nanie 
searching capabilities to an application. SNAPI is not an end product, but is a tool to 
integrate name searching and comparison capabilities into applications. SNAPI is a C++ 
API, and requires a C++ compiler. 

Back to Ton 



2. What platforms does SNAPI support? 

SNAPI works on any platform that supports a C++ compiler. LAS has tested the API on 
^ both Windows (95 and NT) and UNIX systems. 

Back to Top 



3. How much does it cost? 

Pricing for SNAPI is dependent on a number of factors. Please contact LAS at 
infn^las-inc.com for details. 

Back to Top 



4. What kind of support/training options are available? 

A variety of training, support, and consulting options are available to licensees pf SNAPI. 
Please contact LAS at info@.las-inc.com to discuss your particular needs. 

Back to Ton 



5. What kind of documentation is provided? 

Documentation for SNAPI is provided in HTML format. The documentation includes an 
overview, a tutorial, and full documentation of the classes and functions that make up the 
API.. . 

Back to Ton 

6. What is the current production release number for SNAPI? 

SNAPI version 1.0 is currently in beta testing, with an expected release in Ql 1998. 
Back to Ton 

7. Can SNAPI interface with my database (e.g. Oracle, Sybase, etc.). 
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SNAPI makes no assumptions about data storage mechanisms. Data access is the 
responsibility of the developer. 

Back to Top 



8. What type of indexes does SNAPI support? 

Version 1 0 of SNAPI does not include any type of indexing support. When searching a 
database for a query name, all names are evaluated. However, a developer can often 
• ^ seement a database based on application specific rules. For example, a query that coi^iciers 
both name and age might only retrieve names from the database for people that are withm 
1 0 years of the age specified with the query. 

The next version of SNAPI will include mechanisms to support one or more indexing 
strategies. 

Back to Ton 



9. What performance benchmarks are available? 

On an Intel Pentium 133 MHz machine, SNAPI performs roughly 10,000 name 
comparisons per second. Because SNAPI is CPU intensive, significantly higher 
performance can be achieved through the use of faster hardware. Future indexing 
capabilities will also vastly increase overall performance when dealing v,ath very large 
databases. 

Back to Top 



10. What are the machine resource requirements (Memory, disk space) for SNAPI? 

Because SNAPI relies on the developer to supply data (the names to be compared), the disk 
space needed is negligible. Memory requirements depend on the use of the API. For 
example, an application that creates a name object for each name m its database and keeps 
all of these objects in memory will require much more memory than an application that 
re-creates name objects for each query. 

Back to Ton 



Developer Questions 



1. Which header files should I include in my programs? 
^ programs that use SNAPI classes or functions should include the snapLh header file: 
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#include' <snapi.h> 
This file contains all necessary definitions for full use of the API. 
Back to Tod 



2. Does the API support multi-threading? 

SNAPI does not currently support multi-threading. This means that a multithreaded 
application should not allow two separate threads to access the same SNAPI objects at the 
same time. However, an application can perform two concurrent queries, each in a separate 
thread,.so long as each thread has its own name objects (SNOaervNamePata. 
SNEvaiiRameData) and results list object (SNResultsUst). 

Back to Ton 



3* How can I only look at surnames when doing a name comparison? 

SNAPI provides parameters to weight the given name and surname fields relative to each 
other. This weight is applied when calculating a composite score for the name as a whole 
(the composite name score is calculated by performing a weighted average of the separate 
given name and surname scores). By setting the given name weight to 0.0, a developer can 
cause name comparisons to be based solely upon the surname. In doing so, the given name 
threshold should probably also be set to 0.0, to ensure that a poorly matched given name 
does not prevent a name from being considered a match. See documentation on the 
following functions for more details: setGnWeiahtO . setSnWeiahtO , setGnScoreThreshO. 
setSnScoreThreshfy 

Back to Top 



4. 1 have a middle name field in my database. How do I include this information in 
my name comparisons? 

SNAPI uses an internal name model that considers just given name and surname. However, 
both name objects fSNEvalNamePata and SNOuervNamePata) include a constructor that 
accepts a separate middle name. Internally, these constructors append this middle name to 
the given name. 

Future versions may add middle name as an additional name field. 
Back to Top 



5. My application is written in (COBOL, FORTRAN, Java). How can I use SNAPI 
to add name checking to my app? 

Direct use of the SNAPI requires a C++ compiler: Many programming environments 
include the ability to link in object code (compiled code) from other languages. For 
example, a mainframe COBOL application could call code written by a developer in C++ 
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that uses SN API. 

based socket). 

independently for the client and server. 



Back to Ton 



c^ifl^ik up other information associated .vith that name. 

The SNAPI name objects (§N£i^M.^d g^^ 

For example, a developer can create a su^l^^^ ^..^ 
associated database record id. 

See the subclassiiig sections of SIOEialNamsSata and SNOnervNameData for more details. 
Back to Top 



7. 1 have other information (such as age and social security number) that I want to 
include in the comparison process. How do I do this . 

Bv subclassing the SNAP! name classes (SNEvalNamePata and SNOnervNameData) . a 
devdloSr add new data elements into the comparison processes. 

The SNEvalNameData class contains methods to perform score calculations and match 
dSSaSn.Xse methods are virtual, so that a developer can alter or extend the 
functionality to include data specific to their application. 

v^r .vamntP an SSN data element could be added to subclasses of SNEvalNameData and 
SNOueS&a,^^^^^ SNEvalNameData class could then ovemde any or all 
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be altered to relax the name score threshold if the SSN values matched exactly. 

This is just one possible implementation. See the subclassing sections of SNEvalNamePata 
and SNOucrvNamePata for more details. 

Back to Top 



8. Which C-H- compilers does SNAPI support? 

SNAPI can be compiled with any modem C-h- compiler. LAS has developed sample 
applications using Microsoft Visual C-h- 4.2 and 5.0 compilers, as well the GNU g-H- 
compiler under Solaris 2.5 and Linux 2.0. 

Back to Ton 



9. My database contains names of mixed culture. Which culture should I specify 
when I create my parameters object? 

The Anglo culture is generally suitable for use in "generic" queries. This covers queries 
where the culture of the names are Anglo, unknown or mixed. In addition, a developer may 
either allow users to specify a culture when performing a query or establish pre-de fined 
cultural searches. The user-specified or pre-defmed culture could then be used to create all 
requisite objects for performing a comparison, this approach would require the developer to 
create a new SNOuervParms object for each query, as well as ensure that the 
( SNEvalNamePata and SNOuervNamePata objects are created using the same 

SNQueryParms object. 

Future versions may include the ability to automatically classify the culture of a name and 
set search parameters as appropriate. 

Back to Top 



10. My database has over 10 million names. Can SNAPI handle this kind of 
volume? 

The current version of SNAPI is capable of performing approximately 10,000 name 
comparisons per second on an Intel Pentium 133 MHz machine. Because name comparisons 
are CPU intensive, faster hardware can dramatically increase search times. There are also 
other mechanisms that can be used to address performance issues: 

• Pevelopers can often find ways of segmenting their data to avoid having to search the 
entire database for each query. 

• Name objects can be constructed and stored in memory so that they do not have to be 
retrieved and constructed for each query. This can require large amounts of memory. 

• Indexing capabilities will be available in the next version of the API. 

B;ick to Ton 



6 of 8 



1/23/98 4:26 P^ 



Customer Support - FAQ 



http://panther.Ias-inc.com/produci/faq.htin 



11. 1 am planning to create name objects for every name in my database, and hold 
them in memory, reusing them for each query. How much memory will I need to do 
this? 

The size of an SNEvalNamePata object depends on the length of the name it represents. An 
average name requires approximately 600 bytes (the actual amount of storage can vary 
depending on the compiler used). Thus a database of 30,000 names would require about 18 - 
20 megs to be stored in memory. 

Back to Top 



12. How can I change the scores assigned to a particular name variant association. 
How can I delete variant associations or add new ones? 

The cmrent version of SNAPI does permit the developer to make modifications to the 
variant information. In addition, variant checking can be turned off entirely. See the 
<tetT, JseGnVariants(^ and setUseSnVariantsf^ methods for details. 

Back to Ton 



13. How can I delete TAQ values or add new ones? 

The current version of SNAPI does permit the developer to make modifications to the TAQ 
information. In addition, the way TAQs are processed can be adjusted by the developer. See 
the setGnTAOProcessingModeO and setSnTAOProcessingModeO methods for more 
details. 

Back to Ton 



14. What types of name formats does SNAPI support? 

The developer is responsible for accessing application data (e.g. via calls to a database or 
reading from a file). Therefore, SNAPI does not care about how the data is stored. However. 
SNAPI does provide several ways of constructing name objects: 

1 . Given name and surname specified as separate string variables. 

2. Given name, middle name, and surname specified as separate string variables. 

3. Name specified as a single string in a comma delimited format (sumame, given 

name). 

4. Name specified as a single string, with the last stem segment interpreted as the 

surname. 

The first form is the most efficient, because no parsing has to be done to separate the name 
into given name and surname. The fourth form includes advanced processing to identify 
TAQ values and exclude them when determining the surname. 

Back to Ton 
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11. 1 am planning to create name objects for every name in my database, and hold 
them in memory, reusing them for each query. How much memory will I need to do 
this? 

The size of an SNEvalNamePata object depends on the length of the name it represents. An 
average name requires approximately 600 bytes (the actual amount of storage can vary 
depending on the compiler used). Thus a database of 30.000 names would require about 1 8 - 
20 megs to be stored in memory. 

Back to Top 



12. How can I change the scores assigned to a particular name variant association. 
How can I delete variant associations or add new ones? 

The current version of SNAP! does permit the developer to make modifications to the 
variant inforaiation. In addition, variant checking can be turned off entirely. See the 
^sptlJseGnVariantsO and setUseSnVariantsf) methods for details. 

Back'to Ton 



13. How can I delete TAQ values or add new ones? 

The current version of SNAPI does permit the developer to make modifications to the TAQ 
information. In addition, the way TAQs are processed can be adjusted by the developer. See 
the setGnTAOProcessineModen and setSnTAOProcessineModeO methods for more 
details. 

Back to Top 



14. What types of name formats does SNAPI support? 

The developer is responsible for accessing application data (e.g. via calls to a database or 
reading from a file). Therefore. SNAPI does not care about how the data is stored. However, 
SNAPI does provide several ways of constructing name objects: 

1. Given name and surname specified as separate string variables. 

2. Given name, middle name, and surname specified as separate string variables. . 

3. Name specified as a single string in a comma delimited format (surname, given 

name). 

4. Name specified as a single string, with the last stem segment interpreted as the 
surname. 

The first form is the most efficient, because no parsing has to be done to separate the name 
into given name and surname. The fourth form includes advanced processing to identify 
TAQ values and exclude them when determining the surname. 

Back to Ton 
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11. 1 am planning to create name objects for every name in my database, and hold 
them in memory, reusing them for each query. How much memory will I need to do 
this? 

The size of an SNEvalNamePata object depends on the length of the name it represents. An 
average name requires approximately 600 bytes (the actual amount of storage can vary 
depending on the compiler used). Thus a database of 30,000 names would require about 1 8 - 
20 megs to be stored in memory. 

Back to Top 



12. How can I change the scores assigned to a particular name variant association. 
How can I delete variant associations or add new ones? 

The ciirrent version of SNAPI does permit the developer to make modifications to the 
variant information. In addition, variant checking can be turned off entirely. See the 
setUseGnVariantsO and setUseSnVariantsO methods for details. 

Back to Ton 



13. How can I delete TAQ values or add new ones? 

The current version of SNAPI does permit the developer to make modifications to the TAQ 
information. In addition, the way TAQs are processed can be adjusted by the developer. See 
the setGnTAOProcessineModeO and setSnTAOProc essingModef) methods for more 
details. 

Back to Top 



14. What types of name formats does SNAPI support? 

The developer is responsible for accessing application data (e.g. via calls to a database or 
reading from a file). Therefore, SNAPI does not care about how the data is stored. However, 
SNAPI does provide several ways of constructing name objects: 

1 . Given name and surname specified as separate string variables. 

2. Given name, middle name, and surname specified as separate string variables. 

3. Name specified as a single string in a conuna delimited format (surname, given 

name). 

4. Name specified as a single string, with the last stem segment interpreted as the 

surname. 

The first form is the most efficient, because no parsing has to be done to separate the name 
into given name and surname. The fourth form includes advanced processing to identify 
TAQ values and exclude them when determining the surname. 
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SNAPl is 3 .rademaru or Language Analy5ta Systems. All other products mentioned .re registered trademarks or 
trademarks of their respective companies. 

Questions or problems regarding this web site should be directed to wPhmaMer@la5Hnc.com. 
Copyright © 1997 Language Analysis Systems. All rights reserved. 
Last modified: Friday January 23, 1998. 
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File: NH^util.cpp 
Description: 

Implementation of various utility functions used in the SNAPI 



History: 



5/15/97 EFB Created 

3/20/98 EFB Changed names to NH from SN 



#include <string.h> 



#include "NH_util.hpp" 
#include "NHCompParms.hpp" 



// function to remove leading and trailing spaces from a string 
// in place. 

// Strips the string at either end or both ends. 

// Stripchars specify the characters that should 

// be stripped. We start by seeing if they want the 

// trailing chars stripped, which is easy. We simply 

// work backwards from the end of the string, looking for 

// the first non-strippable character, and terminate the 

// string just past that character. Then if they wanted 

// leading chars stripped, we work forwards to the first 

// non-strippable char, and then move that and each following 

// char to the beginning of the string. 

void NH_strip(char *aString) 

{ 

char *end_point; 
char *ch; 
int len; 

if ((len = strlen(aString)) != 0) { // if there is a string 
// start at end 

end_point = aString + len - 1 ; 



// and work back till we get a non-space or get to 

// the begining of our string, chopping off what's left. 

// Also make sure we don't zoom right past the beginning of the 

// string. 

for (; strchr(NH_DEFAULT_WHITESPACE, *endj3oint) != NULL && 
endjjoint != aSlring;.end_point~) 
> 

// if string was all whitespace 

if ((end_point = aString) && strchr(NH_DEFAULT_WHITESPACE, 
*aStnngy!=NULL) 

♦aString = EOS; // erase it all, and we're done, could return here 

else 

*(end j3oint + 1 ) = EOS; // just chop off excess blanks 

// make sure there is still a string, since it might 
// have been stripped entirely above, 
if (*aString) { 

// now find first non space, we know string has at least one 

// nonwhite space, so we don't have to check for NULL. 

for (ch = aSiring; strchr(NH_DEFAULT_WHITESPACE, *ch) !- 

NULL; ch++) 

9 

if (ch != aString) { // if there were leading spaces, move the block 

back 

char *target = aString; , 
while (*ch EOS) { 
* target = *ch; ' 

target++; 
ch++; 

} 

// and get the null char also 
* target = *ch; 
} // end if (are there leading spaces?) 
}// end if(and text left?) 
} // end (is there a string at all ?) 

} 



char * NH_strrchr(char *stringStart, char *searchPos, char scarchChar) 
{ 

while (1) { 

if (♦searchPos = searchChar) 
break; 

.if (searchPos = stringStart) { 



searchPos = NULL; // string not found, so return 

NULL 

break; 

} 

searchPos--; 

} 

return searchPos; 

} 



// File: NH_queens__arrays.Kpp 

// Description: 

// Contains global definitions and declarations for the valid 

// combinations of indexes for the best score calculation 

// History: 

// . 6/4/97 EFB Created 

// 3/20/98 EFB Changed names to NH from SN 

typedef unsigned char byte; 
byte twoByTwo[] = {1,0, 

0.1} 

byte twoByThree[] = { 1,2, 

1,0, 

2,1, 

2, 0, 

0, 1, 

0,2}; 

byte twoByFour[] = { 1,2, 

U3, . 

1.0, 

2,1, 

2, 3, 

2.0, 



3,1, 
3,2, 
3,0, 
0,1, 
■^i'-2, 
• 0,3}; 

byte twoByFive[] = { 1,2, 
1,3, 
1.4, 
.1.0, 
2,1, 
2,3, 
2,4, 
2,0, 
3,1, 
3,2, 
3,4, 
3,0. 
4,1, 
4,2. 
4,3, 
4, 0, 



. 0,1, 
0,2, 
0,3, 
0.4}; 

•^yte threeByXhreeO = { 1, 2, 0, 
■ 1.0,2. 
2,1,0, 

2, 0, 1, 
0.1.2, 
0,2.1}; 

byte threcByFourQ = { 1,2,3, 
1,2, 0. 
1,3,2, 
1,3,0, 
•1.0,2, 
1,0,3, 
2,1,3, 
2,1,0. 
2.3,1. 
2.3,0. 
2,0,1, 
2.0,3, 



.3, 1.2, 
3.1,0. 

3.2, 1, 
3,2.0, 
3.0.1. 

^70.2, 
0,1,2, 
0.1.3. 
0,2,1. 
0, 2. 3. 
0, 3,1. 
0,3,2}; 

byte threeByFive[] 

1.2.4. 

1.2.0, 

1.3, 2, 
1,3,4, 

1.3. 0. 
■ 1,4.2. 

'l.4.3, 

1.4, 0, 
1,0. 2, 



1.0. 3, 

1,0. 4, 

2, 1,3. 

2, 1,4, 
-^1,0. 
. '2,3. 1, 

2,3.4. 

2,3,0, 

2, 4, I, 

2.4. 3, 

2. 4, 0. 

2.0,1. 

2,0,3, 

2, 0, 4. 

3.1.2. 

3.1.4. 

3.1,0. 

3.2,1. 

3,2. 4. 

3,2.0. 

3.4.1. 

3.4, 2. 



3, 4. 0, 
3,0, 1, 

3, 0, 2, 
3,0,4, 

■^•<y-4, 1,2, 

■ 4,1,3, 
4,1,0, 

4, 2, 1, 
4, 2, 3, 

4.2. 0, 
4. 3,1. 

4. 3. 2. 
4, 3,0, 
4,0,1, 
4,0,2. 
4.0,3. 
0,1,2. 
0.1,3, 
0.1,4. 
0.2.1. 
0.2.3. 
0, 2,4. 



0,3, 1, 
0,3,2, 
0,3,4, 
0,4,1, 
0,4, 2, 
0.4,3}; 

byte fourByFour[] 
1,2,0,3, 
1,3,0,2, 
1,3, 2, 0, 
.1,0, 2.3, 
1.0,3. 2, 
2,1,3, 0. 
2,1.0.3, 
2, 3, 1,0, 
2. 3.0,1, 
2.0,1,3. . 

2, 0,3. 1. 
3,1,2, 0. 

3, 1,0, 2, 
3,2.1,0, 
3.2.0,1. 



{ 1.2,3,0. 



3,0, 1,2, 
3,0,2,1. 
0.1.2, 3. 
0,1,3,2. 
0,2, 1,3, 
0,2, 3,1, 
0,3,1,2. 
0,3,2,1}; 

byte fourByFiveQ = { 1.2,3 
1,2,3,0, 
1.2,4, 3. 
1,2,4. 0, 
1,2,0,3, 
1,2,0,4, 
1,3, 2, 4, 
.1,3,2,0, 
1,3,4, 2, 
1,3,4.0, 
1,3,0,2, 
1,3,0,4, 
1,4,2,3, 
1,4,2. 0, 



1,4,3,2, 
1,4. 3,0, 
1,4, 0.2, 
1,4. 0,3, 
1,0. 2,3, 
1,0, 2.4, 
1.0. 3.2. 
1.0, 3,4. 
1.0. 4,2. 
1,0, 4,3, 
2.1,3,4. 
2.1.3,0, 
2,1,4,3. 
2,1,4. 0. 
2, 1,0, 3, 
2,1.0.4, 
2. 3, 1. 4. 
2, 3. 1. 0, 
2,3.4,1. 
2, 3,4, 0. 
■2, 3.0,1, 
2. 3,0.4, 



2,4,1,3, 

2.4,1.0. 

2.4, 3.1, 

2.4,3,0, 

2, 4,0, 1, 

2,4,0.3, 

2.0,1.3, 

2,0,1,4, 

2,0, 3.1, 

2, 0,3,4, 

2,0,4,1, 

2,0.4.3. 

3,2. 1,4, 

3,2,1,0, 

3,2,4, 1, 

3,2,4,0, 

3,2.0,1, 

3.2,0.4. 

3,1,2.4, 

3,1,2. 0. 

3,1.4,2, 

3,1.4,0. 



3,1,0, 2, 
3, 1,0.4, 
3,4,2,1, 
3,4, 2, 0, 
3,4,1,2, 
3.4,1.0, 
3,4.0,2, 
3,4, 0. 1, 
3,0, 2,1, 
3,0, 2.4, 
3.0,1.2. 
.3. 0,1.4, 

3. 0, 4, 2, 
3,0, 4. 1, 
4.2,3. 1. 
4.2. 3.0. 
4.2,1.3. 
4,2,1,0. 

4, 2, 0, 3, 
4,2, 0,1. 
4. 3. 2. 1, 
4.3.2.0. 



4. 3.1,2, 
4,3,1,0, 
4, 3. 0,2, 
4, 3,0,1. 
^'^Tl.2,3, 
4,1,2,0, 
4, 1,3,2, 
4, 1,3, 0, 
4,1,0. 2, 
4,1.0. 3, 
4, 0, 2, 3, 
4, 0, 2, 1, 
4, 0, 3, 2, 
4, 0,3,1, 
4, 0, 1, 2, 
4, 0,1,3, 
0, 2,3,4, 
0, 2,3,1, 
0,2,4, 3, 
0.2,4,1, 
0,2,1,3, 
0, 2,1,4, 



0,3,2,4, 
0, 3.2,1. 

0.3,4.2. . . 

0. 3,4.1. 
0, 3.1.2. 
0,3, 1,4, 
0, 4.2, 3. 
0. 4,2.1. 
0.4,3.2. 
0,4.3,1. 
0,4,1,2, 
0. 4. 1.3. 
0.1,2,3, 
0,1.2,4. 
0,1,3,2, 
0,1.3.4, 
0.1.4,2, 
0,1.4,3}; 

byte fiveByFiveH ={ 1,2,3,4,0, 

1.2.3,0,4, 

1.2,4, 3, 0, 

1,2.4,0,3, 



■ 1,2,0,3.4, 
1,2.0,4,2, 
1,3.2. 4. 0. 

1.3.2, 0.4, 
1,3,4,2. 0, 
1,3,4,0,2, 
1.3,0, 2,4, 
1, 3. 0. 4,2, 

.1,4,2,3.0. 
1,4, 2, 0,3, 
1.4,3,2.0, 
1.4.3,0,2, 
1,4, 0, 2,3. 
1,4, 0,3,2, 
1,0.2, 3,4, 
1.0. 2. 4,3, 
1,0.3.2,4, 
1,0,3,4,2, 
1,0, 4, 2,3, 
1,0.4. 3,2, 
2.1.3,4,0. 

2.1.3, 0, 4, 



2, 1,4,3.0, 
2,1.4, 0,3, 
2,1.0, 3.4. 
2,1,0, 4,1. 
2,3, 1,4,0, 
2, 3,1.0,4. 
2.3,4,1,0, 
2,3.4, 0.1. 
2, 3,0,1.4, 
2,3,0, 4, 1, 
2, 4,1,3,0, 
2,4,1,0,3, 
2,4.3,1,0, 
2.4, 3, 0,1. 
2,4, 0, 1,3, 
2. 4,0,3,1. 
2, 0.1.3.4, 
2,0,1,4.3. 
2,0,3, 1,4, 
2, 0,3,4.1. 
2, 0,4,1.3. 
2.0,4, 3.1. 



3,2,1,4,0, 
3,2,1,0, 4, 

■3,2,4, 1.0, 
3.2.4.0.1, 

""S-.^, 0.1.4, 
3, 2, 0, 4, 2, 
3, 1.2.4,0. 
3,1,2,0.4, 
3,1,4, 2.0, 
3,1,4. 0,2, 
3, 1,0, 2, 4, 
3.1.0, 4.2. 
3.4.2.1.0, 
3,4. 2. 0,1. 
3.4,1,2.0. 
3,4, 1,0,2, 
3,4.0. 2,1. 
3,4.0, 1.2. 
3,0.2,1.4, 
3. 0, 2.4,1, 
3,0.1.2.4, 
3.0.1,4,2, 



3, 0,4.2,1. 
3,0,4. 1.2. 

4, 2,3,1,0, 
4, 2, 3, 0, 1, 
.4.2,1,3.0. 
4,2,1,0.3, 
4.2,0, 3,1. 
•4, 2, 0,1,2, 
4,3,2,1,0, 
4, 3,2,0,1. 
4, 3.1.2,0, 
4, 3,1.0.2. 
4,3,0,2,1, 
4, 3.0.1.2, 
4.1.2.3.0, 
4. 1,2,0.3. 
4.1.3.2.0, 
4,1,3,0,2, 
4,1.0,2.3. 
4,1,0, 3.2. 
4. 0, 2. 3, 1. 
4,0.2,1.3, 



4, 0,3,2, 1, 
4, 0,3, 1,2, 
4. 0,1.2,3. 
4.0.1,3.2. 
0, 2, 3.4, 1. 
T2, 3,1,4, 
0. 2, 4, 3, 1, 
0,2,4,1.3, 
0, 2,1.3,4, 
0,2,1,4, 2. 
0,3,2,4,1, 
0. 3,2.1,4. 
0.3.4,2,1. 
0.3,4.1.2, 
0,-3,l,2, 4. 
0,3, 1,4, 2, 
0,4, 2,3. 1, 
0,4.2,1,3, 
0.4, 3.2,1, 
0,4, 3.1,2, 
0, 4,1,2,3, 
0. 4. 1.3. 2. 



0,1.2, 3,4. 
0,1.2,4,3, 
0, 1.3,2,4, 
0, 1, 3,4,2, 
J), 1.4, 2,3, 
0,1.4,3,2}; 



4. 3, 1, 0, 

4, 3, 0, 2, 

' 4, 3, 0, 1, 

4, 1, 2, 3, 

4, 1, 2, 0, 

4, 1, 3, 2, 

..^.^_4, 1, 3, 0, 

4, 1, 0, 2, 

4/ 1, 0, 3, 

4, 0, 2, 3, 

4, 0, 2, 1, 

4, 0, 3, 2, 

4, 0, 3, 1, 

4, 0, 1, 2, 

4, 0, 1, 3, 

0, 2, 3, 4, 

0, 2, 3, 1, 

0, 2, 4, 3, 

0, 2, 4, 1, 

0, 2, 1, 3, 

0, 2, 1, 4, 

0, 3, 2, 4, 

0, 3, 2, 1, 

0, 3, 4, 2, 

0, 3, 4, 1, 

0, 3, 1, 2, 

0, 3, 1, 4, 

• 0, 4, 2, 3, 

0, 4, 2, 1, 

0, 4, 3, 2, 

0, 4, 3, 1, 



0, 4, 1, 2, 

0, 4, 1, 3, 

0, 1. 2, 3, 

0, 1, 2, 4, 

0, 1, 3, 2, 

0, 1, 3, 4, 

->«-^-.A' 1/ ^' 2. 

0, 1, 4, 3); 

\ 

• byte fiveByFivet] - 

\ 

1, 2, 3, 0, 4, 
1, 2, 4, 3, 0, 
1, 2, A, 0, 3, 
1/ 2, 0, 3, 4, 
1, 2, 0, A, 2, 
1, 3, 2, 4, 0, 
1, 3. 2, 0, A, 
1, 3, 4, 2, 0, 
1, 3, 4, 0, 2, 
1, 3, 0, 2, 4, 
1, 3, 0, 4, 2, 
1, 4, 2, 3, 0, 
1, 4, 2, 0, 3, 
1, 4, 3, 2, 0, 
1, 4, 3, 0, 2, 
1, 4, 0, 2, 3, 
1, 4, 0, 3, 2, 
1, 0, 2, 3, 4, 
1, 0, 2, 4, 3, 
1, 0, 3, 2, A, 
1, 0, 3, A, 2, 



{ 1, 2, 3, 4, 



1, 0, 4, 2. 3, 

1, 0, 4, 3, 2, 

2, 1, 3, 4, 0, 
2, 1. 3r 0, 4, 
2, 1, 4, 3, 0, 
2, 1, 4, 0, 3, 
2, 1, 0, 3, 4, 

*»<«ii^2., 1, 0, 4, 1, 

2, 3, 1, 4, 0, 

2, 3, 1, 0, 4, 

2, 3, 4, 1, 0, 

• 2, 3, 4, 0, 1, 
2, 3, 0, 1, 4, 
2, 3, 0, 4, 1, 
2, 4, 1, 3, 0, 
2, 4, i, 0, 3, 

. 2, 4., 3, 1, 0, 

2, A, 3, 0, 1, 

2, 4, 0, 1, 3, 

2, A, 0, 3, 1, 

2, 0, 1, 3, 4, 

2, 0, 1, 4, 3, 

2, 0, 3, 1, 4, 

2, 0, 3, 4, 1, 

2, 0, 4, 1, 3, 

• 2, 0, 4, 3, 1, 

3, 2, 1, 4, 0, 
3, 2, 1, 0, 4, 
3, 2, 4, 1, 0, 
3, 2, 4, 0, 1, 



3, 2, 0, 1, 4, 
3, 2, 0, 4, 2, 
3, 1, 2, 4, 0, 
3, 1, 2, 0, 4, 
3, 1, 4, 2, 0, 
3, 1, 4, 0, 2, 
^ 3, 1, 0, 2, 4, 
3, 1, 0, 4, 2, 
3, 4, 2, 1, 0, 
3, 4, 2, 0, 1, 
3, 4, 1, 2, 0, 
3, 4, 1, 0, 2, 
3, 4, 0, 2, 1, 
3, 4, 0, 1, 2. 
3, 0, 2, 1, 4, 
3, 0, 2, 4, 1, 
3, 0, 1, 2, 4, 
3, 0, 1, 4, 2, 
3, 0, 4, 2, 1, 

3, 0, 4, 1, 2, 

4, 2, 3, 1, 0, 
. 4, 2, 3, 0, 1, 

4, 2, 1, 3, 0, 
4, 2, 1, 0, 3, 
4, 2, 0, 3, 1, 
4, 2, 0, 1, 2, 
4, 3, 2, 1, 0, 
4, 3, 2, 0, 1, 
4, 3, 1; 2, 0, 
A, 3, 1, 0, 2, 
A, 3, 0, 2, 1, 



4, 3, 0, 1, 2, 

4, 1, 2, 3, 0, 

4, 1, 2, 0, 3, 

4, 1, 3, 2, 0, 

4, 1, 3, 0, 2, 

^,4, 1, 0, 2, 3, 

4, 1, 0, 3, 2, 

4, 0, 2, 3, 1, 

4, 0, 2, 1, 3, 

4, 0, 3, 2, 1, 

.4, 0, 3, 1, 2, 

4, 0, 1, 2, 3, 

4, 0, 1, 3, 2, 

0, 2, 3, 4, 1, 

O; 2, 3, 1, 4, 

0, 2, 4, 3, 1, 

0, 2, 4, 1, 3, 

0, 2, 1, 3, 4, 

0, 2, 1, 4, 2, 

0, 3, 2, 4, 1, 

0, 3, 2, 1, 4, 

0, 3, 4, 2, 1, 

0, 3, 4, 1, 2, 

0, 3, 1, 2, 4, 

0, 3, 1, 4, 2, 

0, 4, 2, 3, 1, 

0, 4, 2, 1, 3, 

0, 4, 3, 2, 1, 

0, 4, 3, 1, 2, 

0, 4, 1, 2, 3, 



0, 4, 1, 3, 1, 

0, 1, 2, 3, 4, 

0, 1, 2, 4, 3, 

0, 1, 3, 2, 4, 

0, 1, 3, 4, 2, 

0, 1, 4, 2, 3, 



// File: MH_getErrorText . cpp 
// 

// Description: 
// 

// Implementation to the NH_getErrorText function. This 

function can 

// be used to return the error text for an associated error 

code. 

// 

// 

// History: 
// -^'ir-.■ 

// 6/23/97 EFB Created 

// 3/20/98 EFB Changed names to NH from SN 

// ' 



#include "NH_get_error_text . h" 
#include <string.h> 



void NH_get_error_text (NHReturnCode errorCode, char *textBuffer, int 

maxChars ) 

{ 

char *errorMsgPtr; 

switch (errorCode) { 

. case NH_SUCCESS: 

errorMsgPtr = "Operation successful"; 
break; 
case NH_MATCH: 

errorMsgPtr = "The comparison matched"; 
break; 
case NH_NO_MATCH: 

errorMsgPtr = "The comparison did not match"; 
break; 

case NH_INVALID__SCORE_THRESH: 

errorMsgPtr « "The threshold must be between .0.0 and 

1.0";. 

break; 

case NH__INVALID_GN_INIT_SCORE: 

errorMsgPtr = "The GN initial score must be between 

0.0 and 1.0"; 

break; 

case NH_INVALID_NH_INIT_SCORE: 

errorMsgPtr « "The SN initial score must be between 

0.0 and 1.0"; 

break; 

case NH_I NVALI D_GN_I N I T_ON_I NI T_MATCH_SCORE : 

errorMsgPtr = "The GN initial on intial match score 
must be between 0.0 and 1.0"; 

break; 

case NH_INVALID_NH__INIT_ON_INIT_MATCH_SCORE: 

errorMsgPtr = "The SN*"initial on'"intial match score 
must be between 0.0 and 1.0"; r 
break; 

case NH_INVALID_NFN_SCORE: 

errorMsgPtr = "The NFN score mus.t be between 0.0 and 

1.0"; 



1.0"; 



1.0"; 



1.0"; 



break; 

case NH_INVALID_FNU_SCORE: 

errorMsgPtr - "The FNU score must be between 0.0 and 

break; 

case NH_INVALID_NLN_SCORE: 

errorMsgPtr = "The NLN score must be between 0.0 and 

break; 

case NH_INVALID_LNU_SCORE: 

errorMsgPtr = "The LNU score must be between 0.0 and 



break; 

-»*5^^-^, case NH_INVALI D_GN_ANCHOR_FACTOR : 

errorMsgPtr = "The GN anchor score must be between 0-0 



and 1.0"; 



and 1.0"; 



and 1.0"; 



and l.C" 



break; 

case NH_INVALID_NH_ANCHOR_FACTOR: 

errorMsgPtr = "The SN anchor score must be between 0.0 

break; 

case NH_INVALID_GN_OOPS_FACTOR: 

errorMsgPtr = "The GN OOPS factor must be between 0.0 

break; 

case NH_INVALID_NH_OOPS_FACTOR: 

errorMsgPtr = "The SN OOPS factor must.be between 0.0 



break; 

case NH_INVALID_ABS__DEL_GN_TAQ_FACTOR: 

errorMsgPtr = "The Abs delete GN TAQ factor must be 
between 0.0 and 1.0"; 

break; 

case NH_I NVALI D_ABS_D I S_GN_TAQ_FACTOR : 

errorMsgPtr = "The Abs disregard GN TAQ factor must be 
between 0.0 and 1.0"; 

break; 

case NH_INVALI D_ABS_DEL_NH_TAQ_FACTOR : 

errorMsgPtr = "The Abs delete SN TAQ factor must be 
between 0.0 and 1.0"; 

break; 

case NH_INVALID_ABS_DIS_NH_TAQ_FACTOR: 

errorMsgPtr = "The Abs disregard SN TAQ' factor must be 
between 0.0 and 1.0"; 

break; 

case NH_INVALID_DEL__GN_TAQ_FACTOR: 

errorMsgPtr = "The delete GN TAQ factor must be 
between 0.0 and 1.0"; 

break; 

case NH_I NVALI D_DIS_GN_TAQ_FACTOR : 

errorMsgPtr - "The disregard GN TAQ factor must be 
between 0.0 and 1.0"; 

break; 

case NH_I NVALI D_DEL_NH_TAQ_FACTOR : 

errorMsgPtr = "The delete SN TAQ factor must be 
between 0.0 and 1.0"; 

break; 

case NH_INVALID_DIS_NH_TAQ_FACTOR: 

errorMsgPtr « "The disregard SN TAQ factor must be 
between 0.0 and 1.0"; 

break; 



case NH_INVALID_GN_COMPRESSED_NAME_SCORE: 

errorMsgPtr = "The GN compressed name score must be 
between 0.0 and 1.0"; 

break; 

case NH_INVALID_NH_COMPRESSED_NAME_SCORE : 

errorMsgPtr = "The SN compressed name score must be 
between 0.0 and 1.0"; 

break; 

case NH_RESULTS_LIST_INSERT_ALLOC_FAILURE : 

errorMsgPtr = "Could not allocate space for a new 

results list"; 

break; 

case NH_GN_VAR_TABLE_CREATION_ERROR: 
'"^^^ errorMsgPtr = "Problem creating GN variant table"; 

break; 

case NH_NH_VAR_TABLE_CREATION_ERROR : 

errorMsgPtr = "Problem creating SN variant table"; 
break; 

case NH_TAQ_TABLE_CREATION_ERROR : 

errorMsgPtr = "Problem creating TAQ table"; 
break; 

case NH_SEG_BREAK_CHARS_CREATION_ERROR: 

errorMsgPtr = "Problem creating segment break 
characters string"; 

break; 

case NH_NOISE_CHARS_CREATION_ERROR : 

errorMsgPtr = "Problem creating noise characters 



string" 



list"; 



storage"; 



invalid"; 



record" ; 



break; 

case NH_INVALID_RESULTS__LIST_SIZE: 

'errorMsgPtr = "Invalid size requested for results 

break; 

case NH_RESULTS_LIST_ALLOCATION_ERROR: 

errorMsgPtr = "Problem creating internal results list 

break; 

case NH_RESULTS_ARRAY_NULL_ERROR : 

errorMsgPtr = "Internal results list storage is 

break; 

case NH_T AQ_RECORD_ALLOC_ERROR : 

errorMsgPtr = "Problem allocating space for new TAQ 



break; 

case NH_VARIANT_ALLOC_ERROR: 

errorMsgPtr - "Problem allocating space for new 

variant record"; 

break; 

case NH_VARIANTS_DONT_EXIST : 

errorMsgPtr « "The supplied names are not currently 

variants"; 

break; 

case NH_INVALID_VARIANT_SCORE: 

errorMsgPtr = "Variant scores must be between 0.0 and 

1.0"; 

break; 

case NH_MAX_VARI ANT_S 1 3E_I NCREMENT_FAI LED : 

errorMsgPtr = "Could 'not increase variant storage to 
add new variant relationship"; 

break; 



case NH__VARIANT_ALREADY_RELATED: 

errorMsgPtr = "The^names are already related to each 

other"; » 

break; 

case NH_C0MP_PARMS_BAD__STREAM_ON_CONSTRUCT : 

errorMsgPtr = "The corap parameters stream passed to 
the constructor is invalid"; 

.break; 

case NH_COMP_PARMS_BAD_STREAM_ON_ARCHIVE : 

errorMsgPtr = "The corap parameters stream passed to 
the archiveData method is invalid"; 

break; 

-^^r^-^. ^ggg NH_NAME_PARMS_FILE_NOISE_CHARS_ERROR: 

errorMsgPtr = "The noise characters could not be 

read"; 

break; 

case NH_NAME_PARMS_FILE_BREAKS_CHARS_ERROR : 

errorMsgPtr = "The break characters could not be 

read"; 

break; 

case NH_NAME_PARMS_BAD_STREAM_ON_CONSTRUCT : 

errorMsgPtr - "The Name Parameters stream passed to 
the constructor was bad"; 

break; 

case NH_NAME_PARMS_BAD_STREAM_ON_WRITE : 

errorMsgPtr = "The Name Parameters stream passed to 
the archive method was bad"; 

break; 

case NH_NAME_PARMS_FILE_BAD_CULTURE_CODE: 

errorMsgPtr = "The culture code read from the Name 
parameters stream was invalid"; 

break; 

case NH_TAQ NOT_FOUND: 

errorMsgPtr « "The specified TAQ could not be found"; 
break; 

case NH_TAQ_ALREADY_EXISTS: 

errorMsgPtr = "The specified TAQ is already defined"; 

break; 

case NH_INVALID_GN_THRESH: 

errorMsgPtr = "The GN Threshold must be between 0.0 



and 1.0"; 



and 1.0"; 



1.0"; 



1.0"; 



break; 

case NH_INVALID_NH_THRESH: 

errorMsgPtr = "The SN Threshold must be between 0.0 

break; 

case NH_INVALID_GN_WEIGHT: 

errorMsgPtr = "The GN Weight must be between 0.0 and 

- break; 
case NH_INVALID_NH_WEIGHT: 

errorMsgPtr = "The SN Weight must be between 0.0 and 



break; 

case NH_INVALID_CULTURE_CODE: 

errorMsgPtr « "The specified culture code is invalifJ" 
break; 

case NH_ERROR_READING_CUS'f OM_PARAMETER_FROM_FILE : 

errorMsgPtr = "A problem was encounter when reading a 
custom parameter from a file"; 

break; 



case NH_ERR0R_WRITING_CUSTOM_PARAM£TER_TO_riLE : 

errorMsgPtr = "A problem was encounter when writing a 
custom parameter to a file"; 

break; ■ 
default: 

errorMsgPtr = "Unknown Error"; 

break; 

} 

strncpy (textBuf f er, errorMsgPtr, maxChars) ; 
textBuf fer [maxChars] = EOS; 



// File: NH_culture codes. cpp 

// " ^ 

// Description: 
// 

// Definition of global array of culture code strings 

// 

// 

// History: 
// 

// 9/12/97 EFB Created 

// ~-«y-v., 3/20/98 EFB Changed names to NH from SN 

// 



#include <string.h> 

#include "NH culture codes. h" 



// The following two global arrays must be the same size. 

// That is, they must have the same number of elements. 

// If you add. or remove items, you must also update the 

// constant NH_NUM_CULTURE_CODES 

// In addition, they must maintain the same relative order 

// (for example, Arabic must be in the same position in both 

// arrays) . 

// lastly, this stuff must match the NHParmsType enum type, 

// both in number and relative position. The NH_NUM_PARMS_TYPES 

// must, also be kept in sync as well. 

char *NH_culture_codes[] = { NH_CULTURE__CO.DE_ANGLO, 



NH_ 


_CULTURE_ 


_CODE_ARABIC, 


NH_ 


_CULTURE_ 


_CODE_ 


_CHINESE, 


NH_ 


_CULTURE_ 


_CODE_GENERIC, 


NH_ 


_CULTURE_ 


CODE^ 


^HISPANIC, 


NH_ 


_CULTURE_ 


_CODE^ 


_KORE AN , 


NH 


CULTURE 


CODE 


RUSSIAN} ; 



char *NH_culture_strings[3 = { NH_CULTURE_STRING_ANGLO, 
NH CULTURE STRING_ARABIC, 



NH CULTURE STRING CHINESE, 



NH_CULTURE_STRING_GENERIC, 
NH_CULTURE_STRI NG_H I S PAN I C , 
NH_CULTURE_STRI NG^KOREAN , 
NH_CULTURE_STRING_RUSSIAN) ; 

bool NH_validate culture_code (NHCultureCode cultureCode) 

{ 

.bool found = false; 



for (int i = 0; i < NH_NUM__CULTURE_CODES; i++) { 

if ( ! strncmp (cultureCode, NH_culcure_codes { i ] , 

NH_MAX_CULTURE_CODE__LEN) ) { _ 

found = true; 
break; 

} 

} 

return found; 



// 
// 
// 
// 
// 
// 
// 
// 
// 
// 
// 
// 
// 
// 
// 



File: namehunter.h 
Description : 

shutdown and startup functions for the NaraeHunter system. 
These are really just blind interfaces to the 
NH_variant_taq_globals functions.' We do this because 
we want to hide the details of the variants and TAQs 
from the API user. 



History: 



tinclude 
# include 
# include 
#incLude 
#include 



9/9/97 
3/20/98 



EFB 
EFB 



Created 

Changed names to NH from SN 



"namehunter . h" 
"NHVariantTable . hpp" 
"NHTAQTable.hpp" 
•'NH_variant_taq_globals . h" 
"NHDigraphBitmapArray . hpp" 



extern 
extern 
extern 



NHVariantTable 
NHVariantTable 
NHTAQTable 



*NH_snVariantTable ; 
*NH_gnVariantTable; 
*NH_taqTable; 



NHDigraphBitmapArray globalDigraphSitmapArray; 



void NH_startup() 
^ 

NH_getVariantTable (NH_SURNAME_VARIANTS) ; 
NH_getVariant Table (NH_GIVENNAME_VARIANTS) ; 
NH_getTAQTable ( ) ; 

} 



void NH shutdown ( ) 
{ 

if {NH_snVariantTable !- NULL) { 
delete NH_snVariantTable; 

NH_snVariantTable = NULL; 

) 

if (NH^gnVariantTable != NULL) { 
delete NH_gnVariantTable; 
'NH_gnVariantTable = NULL; 

} 

if (NH_taqTable !- NULL) { 
delete NH taqTable; 
NH^taqTabTe « NULL; 



} 



File: NH_geiEiTorText.cpp 
Description: 

Implementation to the NH^getErrorText function. This function can 
be used to return the error text for an associated error code. 



History: 

6/23/97 EFB Created 

3/20/98 EFB Changed names to NH from SN 



#include "NH_get_error_text.h" 



#include <string.h> 



void NH_get_error_text(NHRetumCode errorCode, char *textBuffer, int maxChars) 
{ 

char *errorMsgPtr; 

switch (errorCode) { 

case NH_SUCCESS: 

errorMsgPtr = "Operation successful"; 

break; 
case NH_MATCH: 

errorMsgPtr = "The comparison matched"; 

break; 

■ case NH_^NO_MATCH: 

errorMsgPtr = "The comparison did not match"; 
break; 

case NH_INVALID_SCORE_THRESH: 

errorMsgPtr = "The threshold must be between 0.0 and 1 .0"; 
break;. 

case NH_INVALID_GNJN1T^SC0RE: 

errorMsgPtr = "The ON initial score must be between 0.0 and 1 .0"; 
break; 

case NHJNVALID_NH_INIT_SCORE: 

errorMsgPtr = "The SN initial score must be between 0.0 and 1 .0"; 
break; 

case NH INVALID GN INIT ON INIT MATCH SCORE: 



errorMsgPlr = "The GN initial on inlial match score must be 

between 0.0 and 1.0"; 

break; 

case NH,INVALID_NH_INIT_ON_INIT_,MATCH_SCORE: 

errorMsgPtr = "The SN initial on intial match score must be 

between 0.0 and 1.0"; 

break; 

case NH_INVALID^NFN^SCORE: 

errorMsgPtr = "The NFN score must be between 0.0 and 1 .0"; 
break; 

case NH_INVALID_FNU_SCORE: 

errorMsgPtr = "The FNU score must be between 0.0 and 1 .0"; 
break; 

case NH_INVALID^NLN^SCORE: 

errorMsgPtr = "The NLN score must be between 0.0 and 1 .0"; 
break; 

case NH_INVALID__LNU_SCORE: 

errorMsgPtr = "The LNU score must be between 0.0 and 1 .0"; 
break; 

case NH_INVALID1GN_ANCH0R_FACT0R: 

errorMsgPtr = "The GN anchor score must be between 0.0 and 

1.0"; 

break; 

case NH_rNVALID_NH_ANCHOR_FACTOR: 

errorMsgPtr = "The SN anchor score must be between 0.0 and 

1.0"; 

break; 

case NH_INVALID_GN_OOPS_FACTOR: 

errorMsgPtr = "The GN OOPS factor must be between 0.0 and . 

1.0"; 

break; 

case NH_INVALID^NH_OOPS_FACTOR: 

errorMsgPtr = "The SN OOPS factor must be between 0.0 and 

1.0"; 

break; 

case NH_INVALID_ABS_DEL_GN_TAQ_F ACTOR: 

errorMsgPtr = "The Abs delete GN TAQ factor must be between 

0.0 and 1.0"; 

break; 

case NH_^INVALID_.ABS__DIS_GN_TAQ_FACTOR: 

errorMsgPtr = "The Abs disregard GN TAQ factor must be 

between 0.0 and 1.0"; 

break; 

case NHJNVALID_ABS_DEL_NH_TAQ^F ACTOR: 



errorMsgPtr = "The Abs delete SN TAQ factor must be between 

0.0 and 1.0"; 

break; 

case NH_INVALID_ABS_DIS_NH_TAQ_FACTOR: 

errorMsgPtr = "The Abs disregard SN TAQ factor must be 

between 0.0 and 1.0"; 

break; 

case NHJNVALID_DEL_GN_TAQ_FACTOR: 

errorMsgPtr = "The delete GN TAQ factor must be between 0.0 

break; 

case NH_INVALID^DIS_GN_TAQ_F ACTOR: 

errorMsgPtr = "The disregard GN TAQ factor must be between 0.0 

break; 

case NHJNVALID_DEL_NH_TAQ_FACTOR: 

errorMsgPtr = "The delete SN TAQ factor must be between 0.0 

break; 

case NHJNVALID_DIS^NH_TAQ_FACTOR: 

errorMsgPu- = "The disregard SN TAQ factor must be between 0.0 

break; 

case NH_INVALID_GN_COMPRESSED_NAME_SCORE: 

errorMsgPtr = "The GN compressed name score must be between 

break; 

case NH_INVALID_NH_COMPRESSED_NAME_SCORE: 

errorMsgPtr = "The SN compressed name score must be between 

break; 

case NH_RESULTS_LIST_INSERT_ALLOC_FAILURE: 

errorMsgPtr = "Could not allocate space for a ne\y results list"; 
break; 

case NH_GN_VAR_TABLE_CREATION_ERROR: 

errorMsgPtr = "Problem creating GN variant table"; 
break; 

case NH_NH^VAR_TABLE_CREATION_ERROR: 

errorMsgPtr = "Problem creating SN variant table"; 
break; 

case NH_^TAQ_TABLE_CREATION_ERROR: 

errorMsgPtr = "Problem creating TAQ table"; 
break; 

case NH_SEG_BREAK^CHARS_CREATION_ERROR: 

errorMsgPtr = "Problem creating segment break characters string"; 



and 1.0"; 



and 1.0"; 



and 1.0"; 



and 1.0" 



0.0 and 1.0"; 



0.0 and 1.0" 



break; 

case NH_NOISE_CHARS_CREATION_ERROR: 

errorMsgPtr = "Problem creating noise characters string"; 
break; 

case NHJNVALID_RESULTS_.LIST_SIZE: 

errorMsgPtr = "Invalid size requested for results list"; 
break; 

case NH_^RESULTS^LIST_ALLOCATION^ERROR: 

errorMsgPtr = "Problem creating internal results list storage"; 
break; 

case NH^RESULTS>RRAY_NULL_ERROR: 

errorMsgPtr = "Internal results list storage is invalid"; 
break; 

case NH_TAQ_RECORD_ALLOC_ERR0R: 

errorMsgPtr = "Problem allocating space for new TAQ record"- 
break; 

case NH_VARIANT_ALLOC_ERROR: 

errorMsgPtr = "Problem allocating space for new variant record" 
break; 

case NH_VARIANTS_DONT_EXIST: 

errorMsgPtr = "The supplied names are not currently variants"; 
break; 

case NH_INVALID_VARIANT_SCORE: 

errorMsgPtr = "Variant scores must be between 0.0 and 1 .0"; 
break; 

case NH_MAX_VARIANT_SIZE_INCREMENT_FAILED: 

errorMsgPtr = "Could not increase variant storage to add new 

variant relationship"; 

break; 

case NH_VARIANT^ALREADY_RELATED: 

errorMsgPtr = "The names are already related to each other"; 
break; 

case NH_COMP_PARMS_BAD_STREAM_ON_CONSTRUCT: 
errorMsgPtr = "The comp parameters stream passed to the 
constructor is invalid"; 

break; 

case NH_COMP_PARMS_BAD_STREAM_ON^ARCHIVE: 
errorMsgPtr = "The comp parameters stream passed to the 
archiveData method is invalid"; 

break; 

case NH_NAME_PARMS_FILE_NOISE_CHARS_ERROR: 
errorMsgPtr = "The noise characters could not be read"; 
break; , 

case NH_NAME_PARMS_FILE_BREAKS_CHARS_ERROR: 
errorMsgPtr = "The break characters could not be read"; 



break; 

case NH_NAME_PARMS_BAD_STREAM_ON_CONSTRUCT: 
errorMsgPtr = "The Name Parameters stream passed to the 

constructor was bad"; 

break; 

case NH^NAME_PARMS_BAD^STREAM_ON^WRITE: 

errorMsgPtr = "The Name Parameters stream passed to the archive 

method was bad"; 

break; 

.^.^ case NH_NAME_PARMS_FILE_BAD_CULTURE_CODE: 

errorMsgPtr = "The culture code read from the Name parameters 

stream was invalid"; 

break; 

case NH_TAQ_NOT_FOUND: 

errorMsgPtr = "The specified TAQ could not be found"; 
break; 

case NH_TAQ_ALREADY_EXISTS: 

errorMsgPtr = "The specified TAQ is already defined"; 
break; 

case NH_INVALID__GN^THRESH: 

errorMsgPtr = "The GN Threshold must be between 0.0 and 1 .0"; 
break; 

case NHJNVALID_NH_THRESH: 

errorMsgPtr = "The SN Threshold must be between 0,0 and 1 .0"; 
break; 

case NH_INVALID_GN_ WEIGHT: 

errorMsgPtr = "The GN Weight must be between 0.0 and 1.0"; 
break; 

case NH_INVALID_NH_WEIGHT: 

errorMsgPtr = "The SN Weight must be between 0.0 and 1 .0"; 
break; 

case NH_INVALID_CULTURE_CODE: ... . 

errorMsgPtr = "The specified culture code is invalid"; 
break; 

case 

NH_ERROR_READING_CUSTOM_PARAMETER_FROM_FILE: 

errorMsgPtr = "A problem was encounter when reading a custom 
parameter from a file"; 

break; 

case NH^ERR0R_WRIT1NG_CUST0M^PARAMETER_T0_FILE: 
errorMsgPtr = "A problem was encounter when writing a custom 

parameter to a file"; 

break; , 

default: 

errorMsgPtr = "Unknown Error"; 



break; 

} 

stmcpy(textBuffer, errorMsgPir, maxChars); 
textBuffer[maxChars] = EOS; 



File: namehunter.h 



Description: 

shutdown and startup functions for the NameHunter system. 
These are really just blind interfaces to the 
NH_variant_taq_globals functions. We do this because 
we want to hide the details of the variants and TAQs 
fi:om the API user. 



// History: 

// 9/9/97 EFB . Created 

// 3/20/98 EFB Changed names to NH from SN 

^include "namehunter.h" 

#include "NHVariantTable.hpp" 

#include "NHTAQTable.hpp" 

#include "NH_variant_taq_globals.h" 

#include "NHDigraphBitmapArray.hpp" 



extern NHVariantTable *NH_snVariantTable; 

extern NHVariantTable *NH_gnVariantTable; 
extern NHTAQTable *NH_taqTable; 

NHDigraphBitmapArray globalDigraphBitmapArray; 



. void NH_startup() 
{ 

NH_getVariantTable(NH_SURNAME_VARlANTS); 
NH_getVariantTable(NH_GIVENNAME_VARlANTS); 
NH_getTAQTable(); ^ 

} • 



void NH_shutdownO 
{ 

if (NH_snVariantTabIe != NULL) { 
delete NH_snVariantTable; 
NH^snVariantTable = NULL; 

} 



if (NH_griVarianlTable != NULL) { 
delete NH_gnVariantTabie; 
NH^gnVarianlTable = NULL; 

} 

if(NH_taqTable!=NULL) { 
delete NHjaqTable; 
NH_taqTable = NULL; 



// File: NHVariantTable . hpp 
// 

// Description: 
// 

// Interface to the NHVariantTable class. 

// 

// 

// History: 
// 

// 5/7/97 EFB Created 

// 6/23/97 EFB Changed processing to get rid of 

variant types 

// ' as assign an 

indissidual score for each variant pair. 

// 6/23/97 EFB Enhanced comments 

// 9/9/97 EFB Added support for a culture code in 

the" variant object, 

// which required 

changes to this object's interaction 

// with the NHVariant 

class. 

// 3/20/98 EFB Changed names to NH from SN 

// 



Variant information consists of two names that are related, along 
with a designation of variant type, which describes how the two 
names are related. 

The .following holds true in our model: 

if Name A is related to name B with varType V, then B is 
related to A with varType V. 

When constructing the table, 

only one of the pairs {A, B) or (B, A) should be entered. 

internals will ensure that a request of "is B related to A" 

a request of "is A related to B** will work. 

Name variants are single segments. 

Internally, we represent the information as a hash table of 
NH_VarHashTableRecord structures. Each of these structures 
contains a name string, plus a Variant object. 
Each Variant object (a separate class) has the following: 

NHVarld id; 

// unique id for each variant 

byte numRelatedVariants; // number of 

other variants we are related to 

NHVarld variants [MAX_VARIANTS_PER_NAME] 

// array of id's *" 

double varScores[MAX_VARIANTS_PER_NAME] // 

score for each variant *" 



The 
and 



// as related to this variant 



short int varCultures [MAX_VARIANTS_PER__NAME] // score 

for each variant 

// as related to this variant 

The name of the variant is actually stored in the hash table node, 

rather 

than the variant object. 

There are three important functions in the VariantTable class: 

-ssii^-. bool addVariant (char *namel, char *name2, 

NHVarType varType, char *cultCode); 

NHVariant getVariantObjectName (char *name); 
NHVarld getVariantldForName (char *name); 



// The Variant has the method: 

double getVariantScoreForldAndCulture (NHVarld varld, 

char *cultureCode) ; 

The variant table is built by multiple calls to addVariant {) from 

the 

constructor. There is one call to addVariant () for each pair of 

names 

that are related. 

addVariant () takes 2 names that are related, along with a culture 
code to 

describe the relationship. 

getVariantlnfoForName returns the NHVariant object associated with 

the 

name (or NULL) . 

getVariantldForName ( ) returns the id associated with the name. 
Typically, a QueryNameData object gets a pointer to it's variant 

object 

up front. Each time is gets compared to an EvalNameData object, 

it 

calls the getVariantldForName () method to get an id, which it then 

passes 

the to the getVariantScoreForldO to see if the two are related. 

*/ 



# i f nde f NHVARI ANTT ABLE_H PP 

# define NH VARIANTTABLE HPP 



#include "NHVariant .hpp" 

# include "NH_get_error_text . h" 



j / define a const for end of string 
#ifndef EOS 



#define EOS '\0' 
#endif 



// how long can a variant be ? 

#define NH_MAX_VARIANT_LEN 30 



// define a type to specify the type of variant table 

// types are defined by a combination of culture and 

// name field, 

enum NH_VARIANT_TABLE_TYPES 

^'^^'^ NH_SURNAME_VARI ANTS , 
NH_GI VENNAME_VARIANTS , 
NH EMPTY_VARIANTS 

}; 



// define a record in the Variant hash table 
typedef struct NH_VAR_HASH_TABLE_RECORD_T { 

char 

segment I t3H_MAX_VARIANT_LEN + 1]; 

NHVariant 

♦variant; 

struct 

NH_VAR_HASH_TABLE_RECORD_T ■ *next; // pointer to 

next node in hash chain 
} NH_VarHashTableRecord; 

// Do not change without seeing member function hash(). 
# define NH_MAX_VAR_HASH_TABLE_NODES 907 

// define a type that is a pointer to a NH__VarTableRecord 
typedef NH_VarHashTableRecord *NH_VarHashTableRecordPtr ; 

// define a type that is a table (array) of NK_VarTableRecord 
typedef NH_VarHashTableRecordPtr 

NH' VariantHashTable [ NH_MAX_VAR_HASH_TABLE_NODES ] ; . 



class NHVariantTable 
I 

public: 

NHVariantTable(NH_VARIANT_TABLE_TyPES tableType) ; 
virtual -NHVariantTable () ; "* 

// returns the NHVariant object associated with the name, 
// or NULL is there is no object for the name. 
NHVariant * getVariantObjectForName (char *name) ; 

// returns the NHVarld associated with the name. If 

there is 

// no variant for the name, the function returns 
NH_VAR_NOT_FOUND . 

~ NHVarld getVariantldForName ( char *name) ; 

NHReturnCode getStatus ( ) (return 

status; } 



NHReturnCode addVariant (char *namel, 

char *name2, double varScore, char *cultCode) ; 

int getNumHashBuckets ( ) (return 

NH_MAX_VAR_HASH_TABLE_NODES; } 

NH_VarHashTableRecordPtr getHashBucketStartNodeAt (int 

hashTable Index) 

{return variantHashTable [hashTablelndex] ; 1 

// function to change the score associated with two 
variants with a 

// specified culture. 
// The function return: 
// 

It NH_SUCCESS - if things worked out OK 

// NH_VARIANTS_DONT_EXIST - if the either name does 

not exist in the .table ~ 
// 

or the names are not already variants of 

each 

// 

other with the specified culture. 
// NH_INVALID_VARIANT_SCORE - if the score is 

invalid 

NHReturnCode changeVariantScore (char *namel, char 

*name2, char *cultureCode, double newScore) ; 

// a function to remove the relationship between two 
variants within 

// a specified culture. 

// This function is used for the VariantManager 

application. 

// If either variant ends up without a relationship after 

this 

// operation, it is left in, but when saved, the 
resulting file 

// will contain a "*" rather than a related name. The . 

function can 

// return 
// 

// NH_SUCCESS - if things worked out OK 

// NH_VARIANTS_DONT__EXIST - if the names are not 

already variants 

NHReturnCode removeVariantRelation (char *namel, char 

*name2, char *cultureCode) ; 

// return the next available id, which is the number of 
// distinct variants in our table. 
NHVarld getNextAvailableVarld ( ) {return 

nextAvailableVarld; } 

bool getDirtyO {return dirty;) 

void setDirty (bool aBool) {dirty = aBool; ) 

protected: 

// add a variant relatioriship. 

virtual NHVariant * getOrCreateVariant'ObjectForNam 



e {char *name) ; 

NHVarld 



nextAvailableVarld; 



NH_VariantHashTable variantHashTable; 
NHReturnCode status; // are w 

valid 

bool dirty; // have we changed ' 

// Returns an integer in the range [0, 
NH_MAX_VAR_HASH_TABLE_N0DES1 . 

inline unsigned int NHVariantTable: : hash (char *string) 

{ 

char *p; 
unsigned int i; 
unsigned int sum; 

-►*2^, for (p - string, i » 2, sum - 0; *p EOS; p++, i +- 

2) 

sum +« i * *p; 
return sum % NH_MAX_yAR HASH_TABLE__NODES ; 
} // hash 

private: 

}; 



#endif 



// File: NHVariantTable . cpp 
// 

// Description: 

// Implementation to the NHVariantTable class. 

// 
// 

// History: 

// 5/14/97 EFB Created 

3/20/98 EFB Changed names to NH 'f rom SN 

// 



# include <string.h> 
#include <stdio.h> 

# include "NHVariantTable. hpp" 
#include "NH_util . hpp" 
#include "NH_culture_codes . h" 



NHVariantTable: : NHVariantTable (NH_VARIANT_TABLE_^TYPES tableType) 
{ 

status * NH_SUCCESS; 
dirty = false; 

// clear out the hash table 
• for (int i = 0; i < NH_MAX_VAR_HASH_TABLE_NODES; i++) 

variantHashTable[i] = NULL; 

// initialize our variant id variable. 
nextAvailableVarld * 0; 

/* qnv- test stuff 

addVariant ("ED", "EDWARD", 0.7, "E ") ; 
addVariant ("GERRY", "GENERIC", 0.7, "G "); 
addVariant ("HOP", "HOPSING", 0.7, "C "); 
addVariant ("NASSIR", "NARADMAN", 0.7, "A"); 
addVariant ("BORRIS", "NATASIA", 0.7, "R"); 
addVariant ("JUAN", "EPSTEIN", 0.7, "H ") ; 
addVariant ("KORY", "KOREAN", 0.7, "H "); 

*/ 

/* snv test stufff 

addVariant ("HUANG", "WONG", 0.7, "C "); 

*/ 

// the following include lines are commented out because it 
takes forever 

// to compile release versions when they are left in. 
if (tableType NH_GIVENNAME_VARIANTS) ( 
// # include "gnvdata.h" 

else if (tableType — NH_SURNAME_VARI ANTS ) { 
// #include "snvdata.h" 

} 

) 



// release all the memory used to store NH_VarHashTabieRecord 
pointers 

NHVariantTable : : -NHVariantTable ( ) 
{ 

NH_VarHashTableRecordPtr prevRecord; 
NH_VarHashTableRecordPtr varRecord; 
♦ unsigned int tablelndex; 

for (tablelndex = 0; tablelndex < NH_MAX_VAR_HASH_TABLE_NODES; 
tablelndex++) { 

varRecord = variantHashTable (tablelndex] ; 
while (varRecord != NULL) { 
prevRecord = varRecord; 
varRecord = varRecord->next ; 
// delete the record we allocated, 
// as well as the SNVariant object pointed to by 



the- 



// variant member of this record 
delete prevRecord->variant; 
delete prevRecord; 



// returns the NHVariant object associated with the name, 
// or NULL is there is no object for the name. 

NHVariant * NHVariantTable :: getVariantObjectForName (char *name) 

{ 

NHVariant *varia 
ntObject = NULL; 

unsigned int tablelndex; 
NH_VarHashTableRecordPtr tempRecordPtr; 

// find the hash value for the (possible) variant 
tablelndex = hash(name); 

// go throught the records in the chain at that offset in the 
// hash table, and try to find the variant we are looking for. 
tempRecordPtr - variantHashTable (tablelndex) ; 
while (tempRecordPtr != NULL) { 

if ( ! strcmp (tempRecordPtr->segment, name)) { 

variantObject = tempRecordPtr->variant ; 

brealc; 

} 

else // move on to next record in the chain 

tempRecordPtr •» tempRecordPtr->next ; 

) 

return variantObject; 



// returns the NHVariant object associated with the name, 
// or creates a new one. 

NHVariant * NHVariantTable: : getOrCreateVariantObjectForName (char 

♦name) 

{ 

NHVariant *variantObject = getVariantObjectForName (name) ; 



if 



(variantObject == NULL) { 

// no object: existed before, so create one and add it 
// to the hash table. 



unsigned 

int 

NH_VarHashTableRecordPtr 
NH_VarHashTableRecordPtr 

new NH VarHashTableRecord; 



tablelndex; 
prevRecord; 

newVariantHashTableRecord - 



variantObject = new NHVariant (nextAvailableVarId++) ; 
if (variantObject != NULL) { 

// find the hash value for the name 

'**ii;-v. tablelndex = hash (name); 

// fill up the values in the record 
strncpy (newVariantHashTableRecord->segment, name, 
NH_MAX_VARIANT_LEN) ; 

newVariantHashTableRecord->segment (NH_MAX_VARIANT^LEN] 

= EOS; 

newVariantHashTableRecord->variant - variantObject; 
newVariantHashTableRecord->next = NULL; 



// now add the new record to the chain- of entries 
// at that index. 

prevRecord = variantHashTable [ tablelndex) ; 
if (prevRecord == NULL) 

variantHashTable [tablelndex] = 
newVariantHashTableRecord; 

else ( 

while (prevRecord->next != NULL) { 
prevRecord - prevRecord- >next; 

1 

prevRecord->next = newVariantHashTableRecord; 

} 

1 

else 

status = NH_VARIANT_ALLOC_ERROR; 

} 

return variantObject; 



) 



// returns the NHVarld associated with the name. If. there is 
// no variant for the name, the function returns NH_VAR_NOT_FOUND. 
NHVarld NHVariantTabie : : getVariantldForName (char *name) 

( 

NHVariant *variantObject « getVariantObjectForNarae (name ), ; 

NHVarld returnid; 



if (variantObject != NULL) ( 

returnid = variantObject->getVariantId ( ) ; 

■ } 

else 

returnid = NH_VAR_NOT_FOUND; 
return returnid; 



// Add a variant relationship. 
// In order to do this, we must: 
// 

// - make sure both names already have entries in the hash 

table 

// and if not, create them. 

// - get the id of each entry-. .. 

// - ' add the id of each item to the variant information of 

the other. 

// 

// We handle the. special case where the second name is a * . This 
means 

// . that the name should be part of the variant table, but not related 
// to anything. In this case, 

// we only create (or get) a NHVariant object -for the name. 
NHRfefcumCode NHVariantTable : : addVariant (char *namel, char *name2, 

double varScore, 



char *cultureCode) 



{ 



NHReturnCode rc = NH_SUCCESS; 

NHVariant *varObjectl; 
NHVariant *varObject2; 



if ( (varScore < 0.0) i( (varScore > 1.0)) 

rc = NH_INVALID_VARIANT_SCORE; 
else ( 

if (NH validate culture code (cultureCode ) ) { 



also create 
already 

the second 



// Get variant object for both names. This will 
// a new entry if the name(s) were not in the table 
varObjectl = getOrCreateVariantObjectForName (namel) ; 
// if the second name was a *, skip the creation of 



// NHVariant object and do not associate the names, 
if (strcmp (name2, "*••) •) ( 
varObject2 = 
getOrCreateVariantObjectForName (name2) ; 

if ((varObjectl !« NULL) && (varObject2 !« 

NULL) ) { 

// now associate each with the other, 
using the supplied variant type 

rc = varObjectl->addVariant (varObject2, 

cultureCode, varScore); 

if (rc == NH_SUCCESS) 
rc = varObject2- 
>addVariant (varObjectl, cultureCode, varScore); 

) 

} 

) 

else { 

// flag it as an error, but do not mark the entire 

table as bad 

rc - NH_INVALID CULTURE CODE; 

) 



return rc; 



// function to change the score associated with two variants. 

// The function return: 

// 

// NH_SUCCESS - if things worked out OK 

// NH_VARIANTS_DONT_EXIST - if the either name does not exist 

in -the table 

// 

or the names are not already variants of each 

// 

other 

// NH_INVALID_VARIANT_SCORE - if the score is invalid 

NHReturnCode NHVariantTable : : changeVariantScore {char *namel, char 

*name2, char *cultureCode, double newScore) 
I 

NHReturnCode rc = NH_SUCCESS; 

if ((newScore < 0.0) N (newScore > 1.0)} 

rc « NH_INVALID_VARIANT_SCORE; 
else { 

NHVariant *varl = getVariantObjectForName (namel ) ; 
NHVariant *var2 - getVariantObjectForName (narae2 ) ; 

if ((varl == NULL) II (var2 == NULL)) 

rc = NH_VARIANTS_DONT_EXIST; 
else { 

rc = varl->setVariantScoreForIdAndCulture ( var2- 
>getVariantId ( ) , cultureCode, newScore); 

if (rc == NH_SUCCESS) 

rc = var2->setVariantScoreForIdAndCulture (varl 
>getVariantId() , cultureCode, newScore); 

// we should never have a case where the 

items are related 

// in one direction but not the other. 

} 

} 

return rc; - 

} 



// a function to remove the relationship between two variants. 

// If either variant ends up without a relationship after this 

// operation, it is left in, but when saved, the resulting file 

// will contain a *'*'* rather than a related name. The function can 

// return 

// 

// NH SUCCESS - if things worked out OK 

// NH~VARIANTS_DONT_EXIST - if the names are not already 

variants ~ 

NHReturnCode NHVariantTable :: remo\jeVariantRelat ion (char *namel, 

char *name2, char *cultureCode) 

{ 

NHReturnCode -rc - NH_VARIANTS_DONT_EXIST; 

NHVariant *varl = getVariantObjectForName (namel ) ; 



NHVariant 



*var2 



» getVariantOb j ect ForName ( narae2 ) ; 



if ((varl NULL) I! {var2 NULL)) 

rc = NH_VARIANTS_DONT_EXIST; 
else { 

if {varl->reinoveVariant (var2->getVariantId{) , cultureCode) 
NH_SUCCESS) { 

// we should never have a case where the items are 

related 

// in one direction but not the other; 
if (var2->removeVariant (varl->getVariantId ( ) , 
cultureCode) =■= NH_SUCCESS) 

rc = NH_SUCCESS; 

) 



return rc; 

} 



// File: NHVariant . hpo 
// 

// Description: 
// 

// Interface to the NHVariant class. 

// 

// 

// History: 
// 

// 6/6/a7 EFB Created 

/./ 6/23/97 EFB Changed processing to get rid of 

variapt types 

// as assign an 

individual score for each variant pair. 

// ' 9/9/97 EFB Changed object so that each 

relationship has an 

// associated 
culture. Several access methods have 

// been changed to 

allow for a culture specifier. 

// 3/20/98 EFB Changed names to NH from SN 

// 



/* 

Variant represents the variant information .for one name. 
Currently, the name must be a single segment. 
The object contains the following information: 
NHVarld id; 

// unique id 

for this variant 

byte numRelatedVariants; 

// how many vari-ants are we related to? 

NHVarld variantlds [MAX_VARIANTS_PER_NAME1 ; // 

what are the id's of our related variants 

double varScores(MAX_VARIANTS_PER_N7VMEJ ; // 

Score for each variant 



// in variants array above 

short int varCultures [MAX_VARIANTS_PER_NAME) ; // Two 

byte code describing the culture 



// for this variant relationship. These are 



// actually char [2] codes. 

A variant knows how to add an id, type combination to its 
information. 
*/ 

# i f nde f NH VARI ANT_H PP 
#define NHVARIANT_HPP 

#include <stdlib.h> 



#include "NH_get_error_Lext . h" 
#include" "NH culture codes. h" 



typedef unsigned char byte; 



// #define MAX_VARIANTS_PER_NAME 30 

#define NH INIT VARIANTS PER NAME 5" 



// define a constant to represent that two variants were 
// not related. 

#define NH_VARIANTS_NOT_RELATED . "l-O 

// define a variant id as a short int. 
typedef short int NHVarld; 

#define NH VAR_NOT_FOUND -1 



// define a structure to hold the info about a related variant. 

We 

// will use arrays of this structure to list the names related to 
// a variant. 

typedef struct NH_RELATED_VARIANTS_T { 

. NHVarld variantid; // what is the id of our 

related variant 

double * varScore; // Score for this 

variant, as related to the main variant 



// in variants array above 

char varCulture [NH_MAX_CULTURE_CODE_LEN] ; 

// Two byte code describing the culture 



// for this variant relationship. These are 



// actually char[2] codes. 
} NH RelatedVariants; ' 



class NHVariant 
{ 

public: 

NHVariant (NHVarld newld) ; 
virtual ~NHVariant ( ) ; 



// Returns the variant score for the relationship between 

the 

// . the supplied variant id and the variant, within the 

specified 

// culture. If the variants are not related, the 
function returns 



// NH_VARIANTS_NOT_RELATED . 

double getVariantScoreForldAndCulture (NHVarld relatedVarld, 
char *cultCode) ; 

// allows caller to search for across cultures within 

this variant 

double getVariantScoreForldAndAnyCulture (NHVarld 

relatedVarld, char *cultCo«te>-; 

// see if the supplied variant is related to us, and if 

so, 

// replace the existing score with the new score. 
// if not, return NH_VARIANTS_DONT_EXIST. 
NHReturnCode setVariantScoreForldAndCulcure (NHVarld 
relatedVarld, 

char^*cultCode, double score); 

// adds the id of the specified variant (along with an 

associated 

// score and culture code) to our array of variants 
related to us . 

virtual NHReturnCode addVariant (NHVariant *variant, 

char *cultureCode, 

double relatedVarScore) ; 
// • remove a variant from our list 

// ■ return NH_VARIANTS_DONT_EXIST if the id is not in our 

list already 

virtual NHReturnCode removeVariant (NHVarld relatedVarld, 
char *cultureCode) ; 

// return the variant id for this object 
NHVarld getVariantId ( ) (return id;} 

// return the variant id for this object 

byte getNuraVariants ( ) {return numRelatedVariants; } 

NHVarld getldForRelatedVariant (int relVarlndex) 
{ 

NHVarld varld « 0; 

if ((relVarlndex > -1) && (relVarlndex < 
numRelatedVariants)) 

varld " relatedVariants (relVarlndex) .variahtid; 
' return varld; 

} 

char * getCultureCodeForRelatedVariant (int relVarlndex) 

{ 

char *cultureCode * NULL; 

if ((relVarlndex > -1) && (relVarlndex < 
numRelatedVariants) ) 

cultureCode = 
relatedVariants [relVarlndex) . varCulture; 

return cultureCode; 



double 
{ 



getScoreForRelatedVariant (int relVar Index) 
double score = 0.0; 



if { (relVarlndex > -1) && {relVarlndex < 
numRelatedVariants) ) 

score ^ relatedVariants [r:elVarIndex] . varScore; 
return score; 

} 



protected: 

N^HVarld 

for this variant 
byte 

riants; 

byte 

riants; 



id; 

unique id 



NH_RelatedVariants 
private : 



numRelatedVa 
•// how many variants are we related to? 

raaxRelatedVa 
// how many variants are we related to? 
* relatedVariants; 



}; 



#endif 



// File: NHVariant . cpp 
// 

// Description: 
// 

// Implementation to the NHVariant class. 

// 

// 

// History: 
// 

// *6/6/97 EF3 Created 

// 3/20/98 EFB Changed names to NH from SM 

// 



#include <string.h> 
#include <stdio.h> 



#include "NHVariant .hpp" 
^include "NH_util . hpp" 



#ifndef false 

tdefine false 0 
#endif 

#ifndef true 

#define true 1 
#endif 



NHVariant :: NHVariant (NHVarld newld) 
{ 

id = newld; 

numRelatedVariants » 0; 

maxRelatedVariants' = NH_INIT_VARIANTS_PER_NAME; 
relatedVariants = new NH RelatedVariants {maxRelatedVariants] ; 

} 

NHVariant :: -NHVariant ( ) 
{ 

if (relatedVariants) 

delete [] relatedVariants; 

} 

// see if the supplied variant is related to us, and if so, return 
its score. 

double NHVariant: : getVariantScoreForldAndCulture (NHVarld 

relatedVarld, char *cultCode) 

{ 

double returnScore « NH_VARIANTS_^NOT__RELATED; 

for (int i = 0; i < numRelatedVariants; i++) { 

if ( (relatedVariants [i] -variantld =- relatedVarld) && 
(memcmp (relatedVariants [i] .varCulture, cultCode, 
NH_MAX_CULTURE_CODE_LEN) ==0)) { 

returnScore = relatedVariants I i} . varScore; 



break; 

} 

} 

return returnScorfi; 

} 



// See if the supplied variant is related to us under any culture. 
// Because this method is intended to be called several times (for . 
// possibly multiple cultures, it also takes a culture string that 
// is used to keep track of the last culture that was returned. The' 
// first time the function is called, the culture is specified as an 
// empty string. On return, it contains the first culture found 
// in the list for the id. The next time the function is called, 
// "''-^^we look past that culture/id combination in the array looking for 
// the next one, until we return NH_VARI ANTS_NOT_RELATED . 
double NHVariant : : getVariantScoreForldAndAnyCulture (NHVarld 

relatedVarld, char *cultCode) 
{ 

double returnScore'= NH_VARIANTS_NOT_RELATED; 

bool alreadyFoundLastCultCode = false; 

for (int i = 0; i < numRelatedVariants ; { 

if ( (relatedVariants [i] . variantid == relatedVarld)) { 

// ids matched, so see if they specified a culture 

code 

if (*cultCode EOS) { 

// this is first time through, so no check is 

necessary. 

// copy the cult code into the supplied 

string. 

NH_saf e_strcpy (cultCode, 
relatedVariants ti] .varCulture, NH_MAX_CULTURE_CODE_LEN) ; 

returnScore = relatedVariants ( i] . varScore; 
break; 

} 

else { 

// this is not first time through, they are 

passing us the cult code 

// that was found last time, so see if we 
have already found that one 

if (alreadyFoundLastCultCode == true) { 
NH_safe_strcpy (cultCode, 
relatedVariants fi] .varCulture, NH_MAX_CULTURE_CODE_LEN) ; 

returnScore = relatedVariants [ i ]. varScore; 
brea k ; 

■ } 
else { 

// see if this is the cult code they 

passed us 

if (memcmp {relatedVariants [i] .varCulture, 
cultCode, NH_MAX_CULTURE CODE LEN) — 0) { 

~ ~ alreadyFoundLastCultCode » 

true; // we found it 

) 

} 

1 

} 

return returnScore; 

} 



// see if the supplied variant is related to us, and if so, ■ 
// replace the existing score with the new score. 
// if not, return NH_VARIANTS_DONT_EXIST . 

NHReturnCode NHVariant : : setVariantScoreForldAndCulture {NHVarld 

relatedVarld, 

char *cultCode, double score) 

{ 

NHReturnCode rc = NH_VARIANTS_DONT_EXIST; 

for (int i = 0; i < numRelatedVariants; i++) { 

if ( (relatedVariants [i] . variantid relatedVarld) && 

(memcmp(relatedVariants [i] .varCulture; cultCode, 
NH_MAX_CULTURE_CODE_LEN) ==0)} { 

.relatedVariants [i] .varScore = score; 
rc = NH_SUCCESS; 
break; 

) 

} 

return rc; 

) 

// add a variant to our list 

// if the variant is already in the list, do not add it a second 
// time, and return an error 

NHReturnCode NHVariant :: addVariant (NHVariant ♦variant, char 

*cultureCode, 

double relatedVarScore) 

{ 

NHReturnCode rc = NH_SUCCESS; 

NHVarld relatedVarld = variant->getVariantId{) ; 

// check to see if the relationship has already been 

// defined for this id/culture. 

for (int i = 0; i < numRelatedVariants; i++) { 

if { (relatedVariants [i] . variantid == relatedVarld) && 
(memcmp (relatedVariants [i] .varCulture, • 
cultureCode, NH_MAX_CULTURE_CODE_LEN) «== 0) ) { 
rc = NH_VARIANT_ALREADY_RELATED; 
break; 

} 

} 

if (rc NH_SUCCESS) { 

// see if we are maxed out 

if (numRelatedVariants == maxRelatedVariants) ( 
// try to reallocate the space 
NH_RelatedVariants *biggerBlock; 

biggerBlock = new 
NH_RelatedVariants [maxRelatedVariants * 2); 

if (biggerBlock) { 

memcpy (biggerBlock, relatedVariants, 

sizeof(NH RelatedVariant 



s) * maxRelatedVariants) ; 

delete [] relatedVariants; 
rjglatedVariants « biggerBlock; 
maxRelatedVariants *- 2; 

) 

else 

rc = NH t4AX_VARIANT_SIZE INCREMENT_FAILED; 
} • ~ " ■ 

} 

if (rc NH_SUCCESS) { 

relatedVariants [numRelatedVariants] . variantid = 
relatedVarld; 

relatedVariants [numRelatedVariants] .varScore = 
relatedVarScore ; 

strncpy (relatedVariants [numRelatedVariants] . varCulture, 
cultureCode, NH_MAX_CULTURE_CODE_LEN ) ; 

numRelatedVariants++ ; 

} 

return rc; 



// remove a variant from our list 

// return NH_VARIANTS_DONT_EXIST if the id is not in our list already 
NHReturnCode NHVariant :*: removeVariant (NHVarld relatedVarld, char 

*cultureCode) 
I 

NHReturnCode - rc = NH_VARIANTS_DONT_EXIST; 

for (int i - 0; i < numRelatedVariants; i++) { 

if ( (relatedVariants [i] . variantid == relatedVarld) && 
(memcmp ( relatedVariants [i] . varCulture, 
cultureCode, NH_MAX_CULTURE_CODE_LEN ) == 0) ) . { 

■ // now move any ids past the one that match 
// back one space. 

for (int j = i + 1; j < numRelatedVariants; 

{ 

relatedVariants (j - 1] .varScore = 
relatedVariants Ij ). varScore; 

relatedVariants [j - 1] .variantid 
relatedVariantslj] .variantid; 

strncpy (relatedVariants [j - 1] . varCulture, 

relatedVariants tj ] . varCu 

Iture, NH MAX_CULTURE_CODE_LEN) ; 
} 

numRelatedVariants — ; // we not have one 

less variant 

rc NH_SUCCESS; 
break; 

) 

} 

return rc; 

J 



// File: NHTAQTable . hop 

// 

// Description: 
// 

// Interface to the NHTAQTable class. 

// 

// 

// History: 
// 

// 5/7/97 EFB Created 

// 3/20/98 EFB Changed names to NH from SN 

// 

// "^"^-^ 
// 

// , The TAQTable is organized by name and culture. That is the unique 
key 

// in the table. We do lookups by hashing the name, but' must 
consider the 

// , culture code as we walk the hash table bucket. 

#ifndef NHTAQTABLE_HPP 
#define NHTAQTABLE_HPP 



#include "NH_culture_codes . h" 
# include "NHNameData.hpp" 
#include "NH_get_error_text .h" 



// how many characters can a TAQ value be? 
#define MH MAX TAQ LEN 20 



// define the possible values for the TAQ action 
#define NH_TAQ_ACTION_DELETE 'X'- 

#define NH TAQ ACTION DISREGARD 'D' 



// define a record in the hash table of TAQ values 
typedef struct NH_TAQ_RECORD_T { 

char taqString[NH_^4AX_TAQ_LEN + 1]; // 



string that is the 



TAQ value 
char 
char 
char 

found in gn 
char 

found in sn 
char 

13; 



taqType; 

sepi f Con joined; 

gnAction; 

snAction; 



// 
// 



P, S, I, T or Q 
V or N 

// what to do when 



// 



what to do when. 



struct NH_TAQ_RECORD_T *next; 
record in this hash branch 
} NH TAQRecord; 



taqCulture[NH_MAX_CULTURE_CODE_LEN + 

// which culture does this apply to? . 

// pointer to next TAQ 



// Do not change without seeing function NH_TAQhash ( ) 
#define NH MAX TAQ HASH NODES 907 



// define a type that is a pointer to a NH_TAQRecord 
typedef NH^TAQRecord *NH_TAQRecordPtr; 



// define a type that is a table (array) of NH_TAQRecordPtrs 
typedef NH_TAQRecordPtr NH_TAQHashTabie [NH_MAX_TAQ_HASH_NODES] ; 



enum NH_TAQ_TABLE_TYPE { 

N H_PRODUCT I ON_T AQ_T ABLE , 
NH EMPTY TAQ TABLE 



class NHTAQTable 
{ 

->*^-vPublic: 

NHTAQTable (NH_TAQ_TABLE_TYPE type) ; 
-NHTAQTable { ) ; 

// function to return a pointer to the TAQ structure for 

the 

// supplied character string (segment), cultureCode 

combination. 

// Returns NULL if the supplied segment is not known to 
the TAQ table 

// for the specified culture code. 

NH_TAQRecordPtr getTAQSegment (char *nameSeg, 

char *cultureCode) ; 

// specialized version of the above function that looics 

for the 

// name segment in either of the specified culture codes. 

It makes 

// sure that if the name is found in the 
primaryCultureCode, that one 

// gets returned even if we come upon the 
secondaryCultureCode first . 

NH_TAQRecordPtr getTAQSegment (char *nameSeg, 

char *primaryCultureCode, 

char *secondaryCultureCode) ; 
NHReturnCode getStatusO {return 

status; ) 

bool _ getDirtyO 

(return dirty; } 

void setDirtyCboo 
1 aBool) {dirty « aBool; ) 

int getNum 
HashBucketsO {return NH_MAX_TAQ_HASH_NODES; ) 

NH_TAQRecordPtr getHashBucketStartNodeAt (int 

hashTablelndex) 

{return taqHashTable (hashTablelndex] ; } 

NHReturnCode addTAQValue (char 

*taqValue, char taqType, , 

char sepIfConjoined, char 

gnTAQAction, 



char snTAQAction, char *taqCulture) 

NHReturnCode removeTAQValue ( cha r 

*taqValue/ char *cultureCode } ; 

protected: 

private: ^ 

// Returns an integer in the range (0, 
NH_M7VX_TAQ_HASH_N0DES) . 

inline unsigned int hash (char *string) 

char *p; 
unsigned int i; 
unsigned int sum; 



2) 



for (p = string, i 2, sum = 0; *p !=* EOS; p++, i += 

sum +- i * *p; 
return sum % NH_MAX__TAQ_HASH_NODES; 

/* hash */ 



NH_TAQHashTable taqHashTable; 

NHReturnCode status; // are we 

valid 

bool dirty; // haVe we changed 



); 



#endif 



// File: NHTAQTable . cpo 

// 

// Description: 
// 

// Implementation to the NHTAQTable class. 

// 

// 

// History: 
// 

// 5/14/97 EFB Created 

// 9/9/97 EFB Added support for culture 

// 3/20/98 EFB Changed names to NH from SN 

// 



#include <string,h> 
#include <stdio.h> 



#include "NHTAQTable . hpp" 
#include "NH_util .hpp" 



NHTAQTable: -.NHTAQTable (NH_TAQ_TABLE_TYPE type) 
{ 

status = NH_SUCCESS; 

// clear out the hash table 

for (int i = 0; i < NH_MAX_TAQ_HASH_NODES ; i++) 
taqHashTable[i] « NULL; 

// make sure we are not supposed to be doing an empty table, 

if (type == NH_PR0DUCTION_TAQ_TABLE) { 

// parameters are: 

// 

// TAQ string 

// taq Type (T, P, S, Q, I) , 

// sepIfConjoined ('Y' or 'N') 

// Given name action (D - delete, R - 

disregard, X - not applicable) 

// Surname action (D - delete,' R - 

disregard, X - not applicable) 

// Culture (2 char code) 



tool . 



// include the data that was generated via the TAQmanager 
#include "taqdata . h" 

// This stuff is just left over from testing 

/* 

addTAQValue ( " DR" , • T \ ' N ' , NH_TAQ_ACTION_DELETE, 
NH_TAQ__ACTION_DELETE, NH_CULTURE_CODE_GENERIC) ; 

addTAQValue { "MR" , ' T ' , ' N ' , NH_TAQ_ACTION_DELETE, 
NH_TAQ_ACTION_DELETE, NH_CULTURE_CODE_GENERIC) ; 

addTAQValue {"MRS", 'T', 'N' NH_TAQ_ACTION_DELETE, 
NH_tAQ_ACTION_DELETE, NH_CULTURE_CODE_GENERIC) ; 

addTAQValue ( " JR" , ' Q ' , • N ' , NH_TAQ_ACTION_DISREGARD, 
NH_TAQ_ACTION DISREGARD, NH_CULTURE_CODE_GENERIC) ; 

addTAQValue ("SR", 'Q', 'N', NH_TAQ_ACTION DISREGARD, 



NH_TAQ_ACTION_DISREGARD, NH_CULTURE_CODE_GENERIC) ; 

addTAQValue ( "ABDUL" , ' T ' , ' N • , NH_TAQ_ACT I ON_DIS REGARD, 
NH_TAQ_ACTION_DISREGARD, NH_CULTURE_CODE_ARABIC) ; 

addTAQValue ("HOMEY", 'T', 'N', NH_TAQ_ACTION_DISREGARD, 
NH TAQ_ACTION_DISREGARD, NH_CULTURE_COD£_ANGLO) ; 

addTAQValue ("CHINTAQ", 'T', 'N', NH__TAQ_ACTION_DISREGARD, 
NH TAQ ACTION DISREGARD, NH__CULTURE_CODE_CHINESE) ; 

addTAQValue {" HI SPTAQ", 'T', 'N', NH_TAQ_ACT I ON_DIS REGARD, 
NH TAQ_ACTION_DISREGARD, NH_CULTURE_CODE_HISPANIC) ; 

addTAQValue ( "KORTAQ" , ' T ' , ' N ' , NH_TAQ_ACTION_DISREGARD, 
NH TAQ_ACTION_DISREGARD, NH_CULTURE_CODE_KOREAN) ; 

addTAQValue ( "RUSTAQ" , ' T ' , ' N ' , NH_TAQ_ACTION_DISREGARD, 
NH_?A*S2ACTI0N_DISREGARD, NH_CULTURE_CODE_RUSSIAN) ; 

V 
} 

// mark that the table has not been changed. Usefull for 
TAQManager application 
dirty = false; 

) 



// release all the memory used to store the NH_TAQRecords 

NHTAQTable : ; -NHTAQTable { ) 

{ 

NH_TAQRecord *prevTAQRecord; 
NH_TAQRecord *taqRecord; 

int tablelndex; 

for (tablelndex = 0; tablelndex < NH_MAX_TAQ__HASH_NODES; 
tablelndex++) { 

taqRecord = taqHashTable (tablelndex] ; 
while (taqRecord != NULL) { 
prevTAQRecord = taqRecord; 
taqRecord = taqRecord- >next; 
delete prevTAQRecord; 

) 

} 

) 

// function to take the values passed in, create a NH_TAQRecord 
// structure, and add the new structure to this object's 
// taqHashTable. 
. NHReturnCode NHTAQTable :: addTAQValue (char *taqValue, char taqType, char 
sepi f Con joined, 

char gnTAQAction, char snTAQAction, char *taqCulture} 

{ 

NHReturnCode rc « NH_SUCCESS; 

NH_TAQRecord *newTAQRecord; 
int tablelndex; 
. NH__TAQRecord *prevTAQRecord; 

// first, make sure we know the culture code 
if (NH_validate_culture_code (taqCulture) ) { 
... // find the hash value for the taq 
tablelndex = hash(taqValue) ; 



// now see if the taq is already defined for this culture 

code 

// At the same time, find our insertion point, which will 

be either: 

// - the first node in the bucket, if this 

bucket is empty 

// - the end of the bucket 

prevTAQRecord = taqHashTable [ tablelndex] ; 
if (prevTAQRecord != NULL) { 

rc = NH_TAQ_ALREADY_EXISTS; // assume 

it exists 

while (strcmp {prevTAQRecord->taqString, taqVaiue) 1 i 
->'*5i--A^, { strcmp (prevTAQRecord- 

>taqCulture, taqCulture) ) ) { 

if (prevTAQRecord->next == NULL) { 

rc « NH_SUCCESS; // does 

not exist, so looks good so far 

break; // end of bucket 

chain 

} 

prevTAQRecord = prevTAQRecord- 
>next; // move though bucket chain 

} 

} 

// if all is still ok (e.g. no duplicate) 
if {rc NH_SUCCESS) ( 

// now create the new record and set its values 
newTAQRecord = new NH_TAQRecord; 
if (newTAQRecord != NULL) { 

NH_safe_strcpy (newTAQRecord->taqString, 
taqValue, NH_MAX_TAQ_LEN) ; 

newTAQRecord->taqType = taqType; 
newTAQRecord->sepIfConjoined = sepi f Con joined; 
newTAQRecord->gnAct ion = gnTAQAction; 
newTAQRecord->snAct ion = snTAQAction; 
NH_saf e strcpy (newTAQRecord->taqCulture, 
taqCulture, NH_MAX_CULTURE__C0DE3lEM) ; 

newTAQRecord- >next = NULL; 

// now add the new record to the chain of 
entries (or the start of the 

// bucket. We have already hashed the • 
tablelndex value above, and have 

// found the correct insertion point 

if (prevTAQRecord == NULL) 

taqHashTable [tablelndex] « newTAQRecord; 

else 

prevTAQRecord->next » newTAQRecord; 

) 

else { 

rc = NH_TAQ_RECORD_ALLOC_ERROR; 
status = NH_TAQ_RECORD_ALLOC_ERROR; 

} 

} 

} 

else ( 

// flag it as an error, but do not mark the entire table 

as bad 

rc « NH_ INVALID CULTURE CODE; 



} 

return rc; 



NH_TAQRecordPtr NHTAQTable : : getTAQSegment {char *nameSeg, char 

*cultureCocle) 

{ 

int tablelndex; 

NH_TAQRecordPtr tempTAQRecordPtr; 

NH TAQRecordPtr returnTAQRecordPtr = NULL; 



// find the hash value for the (possible) taq 
tablelndex = hash (nameSeg) ; 

// go throught the records in the chain at that offset in the 
// hash table, and try to find the taq we are looking for.' 
tempTAQRecordPtr = taqHashTable [tablelndex] ; 
while (tempTAQRecordPtr != NULL) { 

if ( ! strcmp(tempTAQRecordPtr->taqString, nameSeg) && 

! strcmp ( tempTAQRecordPtr->taqCulture, • \ 

cultureCode) ) { 

returnTAQRecordPtr = tempTAQRecordPtr; 
break; 

} 

else // move on to next record in the chain 

tempTAQRecordPtr * tempTAQRecordPtr->next ; 

} 

return returnTAQRecordPtr; 

} 



// specialized version of the above function that looks for the 
// name segment in either of the specified culture codes. It makes 
// sure that if the name is found in the primaryCultureCode, that one 
// gets returned even if we come upon the secondaryCultureCode first. 
NH_TAQRecordPtr NHTAQTable :: getTAQSegment { char *nameSeg, 

char *primaryCultureCode, 

char *secondaryCultureCode) 

{ 

int tablelndex; 
NH_TAQRecordPtr tempTAQRecordPtr; 
NH_TAQRecordPtr returnTAQRecordPtr « NULL; 



// find the hash value for the (possible) taq 
tablelndex « hash (nameSeg) ; 

// go throught the records in the chain at that offset in the 
// hash table, and try to find the taq we are looking for, 
tempTAQRecordPtr « taqHashTable [tablelndex] ; 
while (tempTAQRecordPtr !» NULL) [ 

if { ! strcmp (tempTAQRecordPtr->taqString, nameSeg) && 

! strcmp (tempTAQRecordPtr->taqCulture, 
primaryCultureCode) ) { , , 

returnTAQRecordPtr = tempTAQRecordPtr; 
break; 

} 



else // move on to next record in the chain 

cempTAQRecordPtr = tempTAORecordPtr->next ; 

} 



// see if we need to check the secondary 
if .(returnTAQRecordPtr == NULL) "'l"' " 

// go throught the records in the chain at that offset in 

the 

// hash table, and try to find the taqwe are looking 
for. . 

tempTAQRecordPtr - taqHashTable ( tablelndex] ; 
while (tempTAQRecordPtr != NULL) { 

if ( !strcmp(tempTAQRecordPtr->taqString, nameSeg) && 
! strcmp ( tempTAQRecordPtr- 
>taqC^uiture, secondaryCultureCode) ) { 

returnTAQRecordPtr = tempTAQRecordPtr;^ 
break; 

} 

else // move on to next record in the chain 

tempTAQRecordPtr ^ tempTAORecordPtr->next ; 
) • • 

1 

return returnTAQRecordPtr; 

) 

// try to remove the TAQ value specified. If found, return 
// NH_SUCCESS. If not found, return. 
/./ The record is deleted if found. 

NHReturnCode NHTAQTable : : removeTAQValue (char *taqValue, char 

*cultureCode) 

{ 

NHReturnCode rc = NH_TAQ_NOT_FOUND; 

NH_TAQRecordPtr tempTAQRecordPtr; 
NH_TAQRecordPtr prevTAQRecordPtr = NULL; 
int tablelndex = 

hash (taqValue) ; 

// go throught the records in the chain at that offset in the 
// hash table, and try to find the tag we are looking for. 
tempTAQRecordPtr = taqHashTable (tablelndex) ; 
while (tempTAQRecordPtr != NULL) { 

if ( !strcmp(tempTAQRecordPtr->taqString, taqValue}. &i . 

! s t rcrap ( t empTAQRecordPt r->t aqCul ture , 



cultureCode) ) 



break; 
else ( 



// save this as the prev 
prevTAQRecordPtr = tempTAQRecordPtr; 
// move on to next record in the chain 
tempTAQRecordPtr = tempTAQRecordPtr->next ; 



} 



// once we are here, tempTAQRecordPtr will be non NULL 

// if we found it. 

if (tempTAQRecordPtr != NULL) { 

if (prevTAQRecordPtr " NULL) { 

// this record was the first in the chain, so we 

must alter 



} 



else // move on to next record in the chain 

tempTAQRecordPtr = tempTAQRecordPtr->next ; • 



the 
for. 



// see if we need to check the secondary 
if (returnTAQRecordPtr == NULL) { 

// go throught the records in the chain at that offset in 

// hash table, and try to find the taq we are looking 



tempTAQRecordPtr = taqHashTable [tablelndex] ; 
while (tempTAQRecordPtr != NULL) { 

if ( ! strcmp (tempTAQRecordPtr->taqString, nameSeg) && 
! strcmp (tempTAQRecordPtr- 
>taqcurture, secondaryCultureCode ) } { 

returnTAQRecordPtr = tempTAQRecordPtr; 
break; 

} 

else // move on to next record in the chain 

tempTAQRecordPtr = tempTAQRecordPtr->next ; 

} 

) 



) 



return returnTAQRecordPtr; 



// try to remove the TAQ value specified. If found, return 
// NH_SUCCESS. If not found, return. 
// The record is deleted if found. 

NHReturnCode NHTAQTable : : removeTAQValue (char *taqValue, char 

*cultureCode) 

{ 

NHReturnCode rc = NH_TAQ_NOT_FOUND; 

NH_TAQRecordPtr tempTAQRecordPtr; 
NH_TAQRecordPtr prevTAQRecordPtr = NULL; 
int tablelndex = 

hash{taqValue) ; 

// go throught the records in the chain at that offset in the 
// hash table, and try to find the taq we are looking for. 
tempTAQRecordPtr = taqHashTable [ tablelndex] ; 
while (tempTAQRecordPtr i= NULL) { 

if (! strcmp (tempTAQRecordPtr->taqString, taqValue),&& 

! strcmp (tempTAQRecordPtr->taqCulture, 



cultureCode) ) 



break; 
else ( 



// save this as the prev 
prevTAQRecordPtr = tempTAQRecordPtr; 
// move on to next record in the chain 
tempTAQRecordPtr = tempTAQRecordPtr->next ; 



) 



// once we are here, tempTAQRecordPtr will be non NULL 

// . if we found it. 

if (tempTAQRecordPtr != NULL) { 

if (prevTAQRecordPtr NULL) { 

// this record was the first in the chain, so we 

must alter 



else // move on to next record in the chain 

tempTAQRecordPtr = tempTAQRecordPtr->next ; 



the 
for. 



// see if we need to check the secondary 
if (returnTAQRecordPtr == NULL) { 

// go throught the records in the chain at that offset i 

// hash table, and try to find the tag we are looking 



tempTAQRecordPtr - taqHashTable [tablelndex] ; 
while (tempTAQRecordPtr != NULL) { 

if ( ! strcmp (tempTAQRecordPtr->taqString, nameSeg) && 
t strcmp (tempTAQRecordPtr- 
>taqCfurture, secondaryCultureCode ) ) ( 

returnTAQRecordPtr = tempTAQRecordPtr; 
break; 

) 

else . // move on to next record in the chain 

tempTAQRecordPtr = tempTAQRecordPtr->next ; 

} " ^ 

} 

return returnTAQRecordPtr; 

} 

// try to remove the TAQ value specified. If found, return 
// NH_SUCCESS. If not found, return. 
// The record is deleted if found. 

NHReturnCode NHTAQTable: rremoveTAQValue (char *taqValue, char 

*cultureCode) 

{ 

NHReturnCode rc = NH_TAQ_NOT_ FOUND; 

NH_TAQRecordPtr tempTAQRecordPtr; 
NH_TAQRecordPtr prevTAQRecordPtr = NULL; 
int tablelndex = 

hash (taqValue) ; 

// go throught the records in the chain at that offset in the 
// hash table, and try to find the tag we are looking for. 
tempTAQRecordPtr ^ taqHashTable ( tablelndex] ; 
while (tempTAQRecordPtr != NULL) { 

if (! strcmp (tempTAQRecordPtr->taqSt ring, taqValue). && 

! strcmp (tempTAQRecordPtr->taqCulture, 



cultureCode) ) 



break; 
else { 



// save this as the prev 
prevTAQRecordPtr = tempTAQRecordPtr; 
// move on to next record in the chain 
tempTAQRecordPtr = tempTAQRecordPtr->next; 



} 



// once we are here, tempTAQRecordPtr will be non NULL 

// if we found it. 

if (tempTAQRecordPtr != NULL) { 

if (prevTAQRecordPtr « NULL) { 

// this record was' the first in the chain, so we 

must alter 



// the hash table entry 

taqHashTable [tablelndex] = tempTAQRecordPt r->next ; 

} 

else // not the first in the chain, so assign the 

previous one's next 

prevTAQRecor'dPtr->next « tempTAQRecordPtr- 
>next; // to our next 

delete terapTAQRecordPtr; 

rc = NH_SUCCESS; 

) 

return rc; 



// terminate the" last segment 
look at next segment 



have 

if ( (numGnSegments < NH_MAX_SEGS_BEFORE_TAQ) && 

( * (gnSegments [numGnSegments ] .segString) != 

EOS)) { 

gnSegments [numGnSegments] .status = 
NH_NAME_FIELD_STATUS_KNOWN ; 

* out Char = EOS; 
numGnSegments++; // 
1 

/ / now do the surname 

.^numSnSegments = 0; 
ihChar = sn; 
outChar = snSegString; 
*outChar = EOS; 

snSegraents [0] .segString = outChar; 
while ((*inChar !- EOS) && (numSnSegments < 
NH_MAX_SEGS_BEFORE_TAQ) ) { 

// if this is a noise character, just move on to the Aext 
one in the name 

if (strchr (noiseChars, *inChar) ) 

inChar++; 
else { 

if (strchr (segDelimChars, *inChar) ) { 

// make sure this is not the next in a series 

of white spaces 

if (* (snSegments [numSnSegments] . segString) != 

EOS) { 

snSegments (numSnSegments) .status = 

NH_NAME_FIELD_STATUS_KNOWN; 
the last segment 
segment 

number of segments 
NH MAX SEGS BEFORE TAQ) 



*outChar = EOS; // " terminate 

numSnSegments++; // look at next 

// make sure we are not past the max 

if (numSnSegments >= 



break; 
inChar++; 

look at next char in name 

outChar++; 

to next available space in the output array 

snSegments [numSnSegments] .segString 

outChar; 
segment 



// 

point 



} 

else 



so was the last one. 



ignore it, and move on 



*outChar = 
// 

inChar++; 



EOS; 



// init the new 



this is a segDelim char,, and 
// so just 



else ( 



segment we are 



just a regular character, so add it to the 



// working on currently 
*outChar = toupper ( *inChar) j 
outChar++; 
next character in segment next time.. 

inChar++; 



write to 
// look 



at next char in name 
} 

) 

} 

// if we get here, it is because we reached the end of the sn 

string . 

// If we were in the middle of building a napie segment, we 

should 

// terminate the segment and increase the number of segments we 

have 

if { (numSnSegments < NH_MAX_SEGS_BEFORE_TAQ) && 

(* (snSegments [numSnSegments] .segString) 

snSegments [numSnSegments] .status - 
NH_NA[4E_FIELD_STATUS_KN0WN; 

*outChar = EOS; // terminate the last segment 

numSnSegments++; // look at next segment 

} 

// now see if there' are any segments at all 

// in the fields. If not, we should create a 

// single blank segment, and mark its status as 

// unknown. If there are segments, v^;e need to check for the 

// special values NFN, NLN, NMN, FNU, LNU, MNU . If we find 

these, 

// blank out the segment, and set the status 

// appropriately. 

// When a name field has more than one segment, but still 

// specifies one of these values, we still blank it out, 

// but we keep the segment as a blank segment. Although the 

// digraph score for this segment will be largely determined by 

// the UNKNOWN or NONE parameter, it still gets treated as a 

// segment in that oops and anchor val can be applied, and 

// it still gets sent to best score. 

// We do not currently look across name fields for these . 
markers . 

// That is, we look for NFN, NMN, FNU. MNU in the given name 

field 

// and we look for NLN and LNU in the surname field. 

// ??? Future versions may look across name fields. 

if (numGnSegments ==0} { 

numGnSegments =1; . . 

gnSegments [0] .-segString - ""; 

gnSegments[0] .status = NH_NAME_FI ELDEST ATUSjUNKNOWN; 

) 

else if (nameParms->getCheckGnUnknowns ( ) ) { 

for (i - 0; i < numGnSegments; i++) { 

if {! strcmp (gnSegments [i] . segString, "NFN")) { 
gnSegments [i] . segString tO] « EOS; 
gnSegments [i] . status = 
NH_NAME_FIELD_STATUS_NON_EXISTAMT; 

) else if (! strcmp (gnSegments [i] . segString, 

"FNU") ) { 

gnSegments [i] . segString [0] = EOS; 
gnSegments [i] . status = 
NH_NAME_FIELD_STATUS_UNKNOWN; 

} else if (! s^rcmp (gnSegments [ i] . segString, 

"NMN")){ 

gnSegments.ti] .segString [0] =^ EOS; 
gnSegmentsti) .status » 



NH_NAME_FIELD_STATUS_NON_EXISTANT; 

} else if ( ! strcmp (gnSegments { i ] . segSt rino, 

"MNU")){ 

gnSegments [i] . segString [0] = EOS; 
gnSegraents [i] . status - 
NH_NAME_FIELD_STATUS_UNKNOWN; 

} 

} 

} 

// now the sn segs 

if (numSnSegments == 0) { 

•''siii-^^.. numSnSegments = 1; 

~~ snSegments [0] . segString = ""; 

snSegments[0] .status = NH_NAME_FIELD_STATUS_UNKNOWN; - 

} 

else if {nameParms->getCheckSnUnknowns ( ) ) { 

for (i = 0; i < numSnSegments; i++) { 

if ( !strcmp(snSegments [i] . segString, "NLN")){ 
snSegments [i] . segString [0] = EOS; 
snSegments [i] . status = 
NH_NAME_FIELD_STATUS_NON_EXISTANT; 

) else if (! strcmp (snSegments [i] . s'egString, 

"LNU")) { 

snSegments [i] . segString [0] = EOS; 
snSegments [ i] . status = 
NH_NAME_FIELD_STATUS_UNKNOWN; 

} 

} 

} 

} 



// function to go through the segments and for each one, see if 

// it is a TAQ value. If so, we associate the TAQ with the previous 

// or following segment, depending on its type (i.e. prefix, suffix, 
etc) . 

// When we store the TAQ, we also store the action associated with 

// the TAQ (currently DELETE or DISREGARD), since this information 

// will be required to determine how to adjust the base segment score 
// 

// Deciding which segment to associate a TAQ with can get pretty 

// hairy, especially when mulitple TAQs can be in a name field 

// consecutively. We use the Following rules for single TAQ values: 

// 

// TAQ Type Segment to Associate with 

// 

// Prefix next segment 

// Suffix previous segment 

// Infix Not supported yet 

// Title next segment 

// Qualifier previous segment 

// 

// These are the basic rules for figuring out which segment to 

associate 

// TAQs with: 

// ' 

// - Any TAQ segments before the first Name segment. are 
associated with 

// the first name segment 



// 

// - Any TAQ segments after the last Name segment are associated 
with 

// the last Name segment 

// 

// - For TAQs that are surrounded by Name segments : 

// 

// - All TAQs between a Name segment (on the left) and a . 

suffix (qualifier) 

// (on the right) are associated with the Name Segment. 

// 

// - All TAQs not fitting the above are assoicated with the 

Name^egment 

// , they proceed. 

// - 

void NHNameData: :processTAQValues (NHTAQTable *taqTable) 
{ 

// NHTAQAction taqAction; 

int i; 

NH_TAQRecordPtr tempTAQList [ NH_MAX_TAQS_PER_S.EGMENT1 ; 
// temp list of TAQs found 

int tempTAQSeglndex; // 

temp index for the tempTaqList 

. NH_TAQRecordPtr tempTAQRecordPtr ; // pointer to structure for 
a TAQ* record 

int numTempTAQSegs ; 

// how many TAQs did we find 

int seglndex; 
// which segment are we loo)cing at 

int lastPref ixindex; // 

index of last prefix like segment we got 

int lastSuff ixindex; // 

index of last suffix like segment we got 

int lastNamelndex; 
// index of last non-TAQ segment we got 

int nameSegmentTaqList Index; 

// where to put taqs in a name segments taq list 

char *primaryCultureCode = 

nameParms->primaryCultureCode; 

char *secondaryCultureCode « 

nameParms->secondaryCultureCode; 

// clear out the TAQ counts for each segment. 
// This is important because the TAQ segments are not 
initalized 

// if they are not filled in. 
•for (i = 0; i < numGnSegments; i++) 
gnSegments [i ) . numTAQs = 0; 

* - if (nameParms->getSeparateGnTaqs ( ) == true) { 
// init some variables 
seglndex = 0; 
numTempTAQSegs = 0; 



field, 



// Start out by looking for TAQs at the start of the name 
// before any name segments. 

// while there are TAQ values at the start of the gn 
// get their associated TAQ record and place that in 
// a temporary list . 
while (seglndex < numGnSegments) { 



tempTAQRecordPtr = taqTable- 
>getTAQSegment {gnSegments [seglndex] . segSrring, 



priraaryCultureCode, 



secondaryCultureCode) ; 

. if (tempTAQRecordPtr != NULL) ( 

// make sure we are not past our space for 

TAQs in the temp list 

// This would happen if a name field started 

out with tons of TAQs 

■"^sir--, if (seglndex < NH_MAX_TAQS__PER_SEGMENT) { 

tempTAQList [numTempTAQSegs] - 

tempTAQRecordPtr; 

numTerapTAQSegs++ ; 

} 

seglndex++; 

) 

else 

break; 

} 

// as long as we found a non-TAQ segment 
if (seglndex < numGnSegments ) { 

// fill up the taqList for the first Name Segment 

with 

// each of the leading TAQs we found. If we found 

no TAQs above, 

// numTempTAQSegs will be 0, so we wont even enter 

into the loop. 

// Also, since we resticted the loop above, we are 

guaranteed to 

// not exceed our space for TAQs for a single 

segment . 

for {i = 0; i < numTempTAQSegs; i++) { 

gnSegments [ seglndex] . taqList [i] . segString = 

gnSegments [i] .segString; 

gnSegments (seglndex) . taqList (i] , taqAction = 
tempTAQList [i] ->gnAct ion; 

gnSegments [seglndex] . taqList fi] .taqType ^ 

tempTAQList [i) ->taqType; 

gnSegments (seglndex) . numTAQs +« 1; 

1 

// now move all the segments back starting with 

first name segment 

// ousting the leading TAQs. If we found that the 

first segment 

// was a name segment, we do not need to move 

anything. 

if (seglndex != 0) { 

for (i = seglndex; i < numGnSegments; 

i++) { 

gnSegments (i - seglndex] = gnSegments ( i ) ; 

J 

// note that we now have less segments, since 
we removed some segments ' 

// that were TAQ values 
numGnSegments -= seglndex; 



// 



now back at the begining 



also, set the seglndex to 0, since we are 



} 



seglndex = 0; 



// now start looking at the remaining segments 
// along the way, we must keep track of 
// - the index of the last Name segment 

we found (start out as 0, since we backed it up to 0) 

// - the index of the last "suffix-like" 

(starts out as -1, since all TAQs were tacked onto seg 



TAQ we found 
0) -^u-^^. 

TAQ we found 
0) 



// - the index of the last "prefix-like" 

(starts out as -1, since all TAQs were tacked onto seg 



// 
// 
// 

// 



If we get a: 
Name : 



lastNamelndex + 1 and the 
// 

gnSegment [lastNamelndex] ; 

// 

the lastPref ixindex and 
// 

segment . 

// 

the TAQ values from the gnSegment array 
// 

{lastNamelndex - seglndex;) 

// 

many TAQs we ousted 
// 
// 

1 // 
// 
// 
// 

seglndex 

// 
// 



associate everything between the 
lastSuff ixindex with 
associate everything between 
seglndex - 1 with this name 
move everything back to oust 
mark the new lastNamelndex 
adjust numGnSegments' for how 



"Suffix Like" 

lastPref ixindex = - 
previous prefix now considered a suffix 
lastSuff ixindex = seglndex 
"Prefix Like" 

lastPrefixIndex - 



End of Segments 

- associate everything between the 
lastNamelndex + 1 and seglndex 

// with gnSegment [lastNamelndex] ; 

// - adjust numGnSegments for how 

many TAQs we had at end 
// 

// Note that we do not do any storing of anything 
until we either reach the 

// end of the sements, or get a non-taq segment. 

// ... 

// Also, as we read TAQ segments, we store a 
pointer to their retrieved 

// structure in a list. We do this because we must 

read ahead before 

// we can store a TAQs relevant info (type, action) 
as being associated 

// with a segment,' and we do not want to have to 
look up the TAQ info twice. 



numTempTAQSegs = 0; 

lastPref ixindex = -1; 
lascSuf f ixindex = -1.; 
lastNamelndex = seglndex; 

seglndex+T; // lookatthe next segment 

while (seglndex < numGnSegments ) { 
tempTAQRecordPtr = taqTable- 
>getTAQSegment (gnSegments [seglndex] . segString, 

primaryCultureCode, 

secondaryCuitureCode) ; 

if (tempTAQRecordPtr == NULL) { 

// segment is not a TAQ value 

// do an initial check to make sure we 

actually got one or more TAQs . 

// if not, all we really have to do is 

just reflect the new value for 

// lastNamelndex. 

if (numTempTAQSegs > 0) { 

// so associate all taqs between 

the previous Name segment and 

// the last suffix with the 
previous Name Segment. Since lastSuff ixindex 

// may be -1 {if there we not 
suffixes), we may not even enter this for loop. 

// this variable is necessary 

because the segment at lastNamelndex 

// might already have TAQs stored 

in its taqList (due to prefixes) . 

// We must keep track of where 
the next available place in that list is. 

nameSegmentTaqListlndex = 

gnSegments (lastNamelndex] . numTAQs; 

tempTAQSeg Index = 0; 

for {i = lastNamelndex + 1; {i <= 
lastSuff ixindex) && (nameSegmentTaqListlndex < NH MAX_TAQS_PER SEGMENT) ; 
i++) { 

gnSegments [lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] .segString = gnSegments [i] .segString; 

gnSegments [lastNamelndex]-. taqL 
ist (nameSegmentTaqListlndex] .taqAction - tempTAQList [tempTAQSeglndex] - 

>gnAction; 

gnSegments [lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] .taqType = tempTAQList (tempTAQSeglndex) - 
>taqType; 

tempTAQSegIndex++ ; 
nameSegmentTaqListIndex++; 
gnSegments [lastNamelndex] .numT 

AQs +=* 1; 

} 

// associate everything at or 
past the previous prefix (s) with the name 

// ' segment we just found. Again, 

since there may not have been any 

// prefixes, we might not even 



enter- this for loop 

if (lastPrefixIndex != -1) { 

for (i = lastPrefixIndex; (i < 
seglndex) && (tempTAQSeglndex < NH__MAX_TAQS3pe:R__SEGMENT} ; i++) { 

gnSegments { seglndex} .taq 
List[i - lastPrefixIndex] . segString * gnSegments [i] • segString; 

gnSegments (seglndex] .taq 
List[i - lastPrefixIndex] .taqAction - tempTAQList ( tempTAQSeglndex )- 
>gnAction; 

gnSegments {seglndex] .taq 
Listfi - lastPrefixIndex] .taqType = tempTAQList t^empTAQSeglndex] - 
>ta.gType; 

tempTAQSegIndex++; 
gnSegments [ seglndex] .num 

TAQs += 1; 

1 



// 
// 



now move all the segments back 
ending with the last segment. 



starting with this segment and 

We move them back to the first 

// segment after the previous 
Name segment, which is numTempTAQSegs places 

for (i - seglndex; i < 



numGnSegments; i++) 
= gnSegments [i] ; 



{ 



gnSegments [i - numTempTAQSegs] 



numGnSegments; i++) { 
numTempTAQSegs] ; 



numTempTAQSegs ; 



//for (i = lastNamelndex +1; i < 
// gnSegments [i] «= gnSegments (i + 

. //I 

numGnSegments -« 
we not have less segments, since we got 



// 



rid of some TAQs 



numTempTAQSegs ; 
too 

0; 

the temp segment array 



seglndex; 
lastNamelndex 

} 

else 

(tempTAQRecordPtr->taqType == 
tempTAQRecordPtr; 



seglndex -= 

// move our pointer back 



numTempTAQSegs 



// clear out 



lastNamelndex » 

// mark the new 



( 



if ( {tempTAQRecordPtr->taqType == 'P') I I 
*T')) { 

// got a prefix or a title 
tempTAQList [numTempTAQSegs] « 

numTempTAQSegs ++ ; 

// only set the prefix index if 



we do not have one on record. 

// otherwise, we will only get 

the right most prefix in a string 

// of consecutive prefixes, 
if {lastPref ixindex == -1) 

lastPref ixindex = seglndex; 

} 

else ( 

// must be a suffix or qualifier 
t empTAQLi s t ( numTerapTAQSegs ] = 

tempTAQRecordPtr ; 

numTempTAQSegs++; 
lastPref ixindex = - 
1; // any previous prefixes now considered a suffix 

lastSuff ixindex = seglndex; 

) 

1 

seglndex++; // look at next 

segment 

} 

// now we are at the end of all segments, so make 

sure that any 

// TAQs that were trailing get associated with the 

last name segment. 

// do an initial check to make sure we actually got 

one or more TAQs . 

// if not, all we really have to do is just reflect 

the new value for 

// lastNamelndex. 

if (numTempTAQSegs > 0) { 

// associate all the stored tags with the 

last name segment . 

// in the loop below: 

// i is the index into the gnSegments 

.list for the TAQ string we are copying 

// tempTAQSeg Index is the index into 

the tempTAQList for the saved TAQ info 

// lastNamelndex is the index into the 

gnSegments for the name getting 

// the TAQs associated with it. 

// gnSegmentTaqList Index is the index 

into the taqList for the name getting 

// the TAQs associated with it. 

// 

// We must be careful that we do not 
overwrite any TAQs already associated with 

// the name (from prefixes) . For this 
reason, we use separate indexes for the 

// "tempTAQList and the gnSegments* taqList. 

nameSegmentTaqListlndex = 
gnSegments [lastNamelndex] .numTAQs; 

tempTAQSeg Index = 0; 

for (i = lastNamelndex +1; (i < numGnSegments) 
(nameSegmentTaqListlndex < NH_MAX_TAQS_PER_SEGMENT) ; i++) { 

gnSegment£ [lastNamelndex] . taqList [nameSegm 
entTaqListlndex] . segString « gnSegments [i] . segString; 

gnSegments [lastNamelndex] . taqList [nameSegm 
entTaqListlndex] .taqAction tempTAQList [tempTAQSeglndex] ->gn Act ion; 



gnSegments [las tName Index ]. caqList [nameSegm 
entTaqListlndex} . taqType = tempTAQList { tempTAQSeglndex] ->taqType; 
tempTAQSegIndex++; 
nameSegmentTaqListIndex++; 
gnSegments [lastNamelndex] , numTAQs += 1; 

} 

// now we can just chop off ail the TAQ 
segments by reducing numGnSegments . 

numGnSegments -= numTempTAQSegs ; 

} 

} 

eise { 

// we did not get any Non-TAQ segments. Move all 
the' segments to the TAQ 

// list for the first segment, create a single 
segment, and set its string 

// value to "" . 

gnSegments [0] .numTAQs = 0; // set this in case 
there were no TAQs (empty string) 

// In that case, we would not have 

cleared it out orignally 

for (i = 0; i < numTempTAQSegs; i++) { 
gnSegments [ 0] . taqList ( i] . segString = 

gnSegments [ij .segString; 

gnSegments [0] . taqList [i] . taqAction = 
tempTAQList [i] ->gnAction; 

gnSegments [ 0 ] . taqList [ i ] . taqType = 

tempTAQList [i] ->taqType; 

gnSegments (0) .numTAQs += 1; 

) 

numGnSegments = 1; 
gnSegments [0] . segString = 

gnSegments 1 0] .status = NH NAME_FIELD_STATUS_UNKNOWN; 

} 

} 

// as a last step, we must make sure that the number of 
gnSegments is 

// now no greater than NH_MAX_SEGS_AFTER_TAQ . We just ignore 
any segments 

// after the max. 

if (numGnSegments > NH_MAX_SEGS_AFTER_TAQ) 

numGnSegments = NH_MAX_S£GS_AFTER_TAQ; 

// clear out the TAQ counts for each segment. 

// This is important because the TAQ segments are not 

initalized 

// if they are not filled in, 
for (i = 0; i < numSnSegments ; i++) 
snSegments [ i ] . numTAQs = 0; 

// Now do the SN segments 

if (nameParms->getSeparateGnTaqs { ) — true) ( 
// init some variables ' 
seglndex 0; 
numTempTAQSegs =0; 



// Start out by looking for TAQs at the start of the name 

field, 

// before any name segments. 

// while there are TAQ values at the start of the sn 
// get their associated TAQ record and place that in 
// a temporary list, 
while (seglndex < numSnSegments) { 
tempTAQRecordPtr = taqTable- 
>getTAQSegment (snSegments (seglndex] . segString, 

primaryCultureCode, 

secondaryCultureCode) ; 

if (tempTAQRecordPtr !- NULL) { 

// make sure we are not past our space for 

TAQs in the temp list 

// This would happen if a name field started 

out with tons of TAQs 

if (seglndex < NH_MAX_TAQS_PER_SEGMENT) { 
tempTAQList [numTempTAQSegs] = 

t empTAQRe cor dPt r ; 

numTempTAQSegs ++ ; 

} 

seglndex++; 

} 

else 

break; 

1 

// as long as we found a non-TAQ segment 
if (seglndex < numSnSegments) { 

// fill up the taqList .for the first Name Segment 

with 

// each of the leading TAQs we found. If we found 

no TAQs above, 

// numTempTAQSegs will be 0, so we wont feven enter 

into the loop. 

// Also, since we resticted the loop above, we are 

guaranteed to 

// not exceed our space for TAQs for a single 
segment . - - 

for (i = 0; i < numTempTAQSegs; i++) . { 

snSegments [seglndex] . taqList [i] .segString » 

snSegments {i ] .segString; 

snSegments [seglndex] . taqList [i] .taqAction - 
tempTAQList [i].->snAction; 

snSegments [seglndex] .taqList [i] .taqType « 

tempTAQList [i] ->taqType; 

snSegments (seglndex] .numTAQs +« 1; 

} 

// now move all the segments back starting with 

first name segment 

// ousting the leading TAQs. If we found that the 

first segment 

// was a name segment, we do not need to move 

anything. 

if (seglndex !« 0) [ 

for (i - seglndex; i < numSnSegments; 



we removed some segments 



) 

// 



snSegmenrs[i - seglndexj « snSegments ( i 1 ; 
note that we now have less segments, since 



// that were TAQ values 
numSnSegments -= seglndex; 



// 



now back at the begining 



also, set the seglndex to' 0, since we are 



} 



seglndex 



0; 



// now start looking at the remaining segments 
// along the way, we must keep track of 
// - the index of the last Name segment 

we found (start out as 0, since we backed it up to 0) 

// - the index of the last "suffix-like" 

{starts out as -1, since all TAQs were tacked onto seg 



TAQ we found 
0) 

TAQ we found 
0) 



// - the index of the last "prefix-like" 

(starts out as -1, since all TAQs were tacked onto seg 



// 
// 
// 
// 



If we get a: 
Name : 



lastNamelndex + 1 and the 

// 

snSegm^nt [lastNamelndex] ; 

// 

the lastPref ixindex and 
// 

segment . 

// 

the TAQ values from the snSegment array 
// 

(lastNamelndex = seglndex;) 

// 

many TAQs we ousted 
// 
// 

1 // 
// 
// 
// 

seglndex 

// 
// 



associate everything between the 
lastSuff ixindex with 
associate everything between 
seglndex - 1 with this name 
move everything back to oust 
mark the new lastNamelndex 
adjust numSnSegments for how 



"Suffix Like" 

lastPref ixindex = - 
previous prefix now considered a suffix 
lastSuff ixindex - seglndex 
"Prefix Like" 

lastPref ixindex - 



End of Segments 

- associate everything between the 
lastNamelndex + 1 and seglndex 

// with snSegment [ lastNamelndex] ; 

// - adjust numSnSegments for how 

many TAQs we had at end 
// 

// Note that we do not do any storing of anything 
until we either reach the 

// end of the sements, or get a non-taq segment. 

// 

// Also, as we read TAQ segments, we store a 
pointer to their retrieved 

// structure in a list. We do this because we must 



read ahead before 

// we can store a TAQs relevant info (type, action) 
as being associated 

// with a segment, and we do not want to have to 
look up the TAQ info twice. 

numTempTAQSegs = 0; 
lastPref ixindex = -1; 
lastSuf f ixindex = -1; 
lastNamelndex = seglndex; 

seglndex++; // look at the next segment 

while (seglndex < numSnSegments} { 
tempTAQRecordPtr = taqTable- 
>ge"tf5RQSegment (snSegments {seglndex] . segStririg, 

primaryCultureCode, 

secondaryCultureCode) ; 

if (tempTAQRecordPtr == NULL) { 

// segment is not a TAQ value 

// do an initial check to make sure we 

actually got one' or more TAQs. 

// if not, all we really have to do is. 

just reflect the new value for 

// lastNamelndex. 

if (numTempTAQSegs > 0) { 

// so associate all taqs between 

the previous Name segment and 

// the last suffix, with the 
previous Name Segment. Since lastSuff ixindex 

// may be -1 (if there we not 
suffixes), we may not even enter this for loop. 

// this variable is necessary 

because the segment at lastNamelndex 

// might already have TAQs stored 

in its taqList (due to prefixes) . 

// We must keep track of where 
the aext available place in that list is. 

nameSegmentTaqListlndex = 

snSegments [lastNamelndex] . numTAQs; 

tempTAQSeg Index = 0;, 

for (i = lastNamelndex + 1; (i <= 

lastSuf fixindex) && (nameSegmentTaqListlndex < NH_MAX_TAQS_PER_SEGMENT ) ; 

i++) { 

snSegments [lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] .segString « snSegments [i] .segString; 

snSegments [lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] .taqAction - tempTAQList [ tempTAQSeglndex] - 
>snAction; 

snSegments [lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] .taqType = tempTAQList [tempTAQSeglndex j - 
>taqType; 

tempTAQSegIndex++ ; 
nameSegmentTaqListIndex++; 
' ■ snSegments [lastNamelndex] .numT 

AQs +=1; 



) 



// associate everything at or 
past the previous prefix (s) with the name 

// ""segment we just found. Again, 

since there may not have been any 

// prefixes, we might not even 

enter this for loop 

if (lastPrefixIndex != -1) { 

for (i = lastPrefixIndex; (i < 
seglndex) && ( tempTAQSeglndex < NH_MAX_TAQS_PER_SEGMENT) ; i++} { 

snSegments [seglndex] .tag 
Listji - lastPrefixIndex] . segString = snSegments [ i ]. segString; 

snSegments [seglndex] .tag 
List[i - lastPrefixIndex] .taqAction = tempTAQList [tempTAQSeglndex ] - 
>snAction; 

snSegments [ seglndex] .tag 
List[i - lastPrefixIndex] . taqType = tempTAQList [tempTAQSeglndex] - 
>taqType; 

tempTAQSegIndex++; 
snSegments [seglndex] .num 



TAQs 



1; 



Starting with this segment and 
We move them back to the first 



} 

// 
// 

// 



) 

now move all the segments back 
ending with the last segment, 
segment after the previous 



Name segment, which is numTempTAQSegs places 

for (i = seglndex; i < 

numSnSegments; i++) { 

snSegments [i - numTempTAQSegs] 

= snSegments [i] ; 



numTempTAQSegs ; 



// 



numSnSegments -= 
we not have less segments, since we got 



// . rid of some TAQs 

numTempTAQSegs ; 
too 



the temp segment array 



seglndex -= 

// move our pointer back 



numTempTAQSegs 



clear out 



seglndex; 
lastNamelndex 



lastNamelndex « 
// 



mark the new 



} 

else { 

if ( (tempTAQRecordPtr->taqType == 'P') (j 
( tempTAQRecordPtr->taqType 'T*)) { 

// got a prefix or a title 
tem{>TAQList [numTempTAQSegs] = 

tempTAQRecordPtr; 

numTempTAQSegs ++ ; 



we do not have one on record, 
the right most prefix in a string 



else 



tempTAQRecordPtr; 

1; ■'*if-.v.. // 



// only set the prefix index if 

// otherwise, we will only get 

// of consecutive prefixes, 
if (lastPrefixIndex =- -1) 

lastPrefixIndex = seglndex; 

{ 

// must be a suffix or qualifier 
tempTAQList [numTempTAQSegs] - 



numTempTAQSegs++ ; 
lastPrefixIndex = - 
any previous prefixes now considered a suffix 
lastSuf f ixindex = seglndex; 



} 

seglndex++ ; 



// 



look at next 



segment 

sure that any 
last name segment . 

one more TAQs. 
the new value for 

last name segment. 



} 

// 
// 

// 
// 

// lastNamelndex. 

if (numTempTAQSegs > 0) { 

// associate all the stored tags with the 



now we are at the end of all segments, so make 
TAQs that were trailing get associated with the 

do an initial check to make sure we actually got 
if not, all we really have to do is just reflect 



// in the loop below: 
// i is the index into the snSegments 

list for the TAQ string we are copying 

// tempTAQSeglndex is the index into 

the tempTAQList for the saved TAQ info 

// lastNamelndex is the index into the 

snSegments for the name getting 

// the TAQs associated with it. 

/'/ snSegmentTaqListlndex is the index 

into the taqList for the name getting 

// the TAQs associated with it. 

// 

// We must be careful that we do not 
overwrite any TAQs already associated with 

// the name (from prefixes) . For this 
reason, we use separate indexes for the 

// tempTAQList and the snSegments' taqList. 

nameSegmentTaqListlndex = 
snSegments [lastNamelndex] . numTAQs; 

tempTAQSeglndex = 0; 

for (i = lastNamelndex +1; (i < numSnSegments) 
&& (nameSegmentTaqListlndex < NH_MAX_TAQS_PER_SEGMENT ) ; i++) { 

snSegments [lastNamelndex] .taqList [nameSegm 
entTaqListlndex] . segString *» snSegment3(i] . segString; 

snSegments [lastNamelndex] .taqList (nameSegm 



entTaqListlndex] . taqAction = tempTAQList [tempTAQSeglndex) ->snAction; 

snSegments [lastNamelndex] . taqList [nameSegm" 
entTaqListlndex] .taqType = tempTAQList [tempTAQSeglndex] ->taqType; 

tempTAQSegIndex++; 

nameSegmentTaqListIndex++ ; 

snSegments [lastNamelndex] . numTAQs += 1; 

} 

// now we can just chop off all the TAQ 
segments by reducing numSnSegments . 

numSnSegraents -= numTempTAQSegs; 

} 

else { 

// we did not get any Non-TAQ segments. Move all • 
the segments to the TAQ 

// list for the first segment, create a single, 
segment, and set its string 

// value to "". 

snSegments [0] .numTAQs = 0; // set this in case 
there were no TAQs (empty string) 

// In that case, we would not have 

cleared it out orignally 

for (i " 0; i <• numTempTAQSegs ; i++) ( 
snSegments [0] .taqList ti] .segString = 

snSegments [i] .segString; 

snSegments [0] . taqList (i] .taqAction = 
tempTAQList [i] ->snAction; 

snSegments [0] .taqList [i] .taqType = 

tempTAQList [i] ->taqType; 

snSegments [0] .numTAQs += 1; 

} 

numSnSegments = 1; 
snSegments (0) . segString = ""; 

snSegments [0 J .status = NH_NAME_FIELD_STATUS_UNKNOWN; 

} 

} 

// as a last step, we must make sure that the number of 
gnSegments is 

// now no greater than NH_^dAX_SEGS_AFTER_TAQ . We just ignore 
any segments 

// after the max. 

if (numSnSegments > NH__MAX SEGS_AFTER_TAQ) 

numSnSegments - NH_MAX_SEGS_AFTER_TAQ; 

} 



// function to generate index keys for this name. 

// Each key includes a portion for the GN and a portion 

// for the SN. 

// We currently support two key lengths, 32 bits or 64 bits. 

// The GN length does not have to be, the same as the SN length,. 

// but GN keys generated must be the same length (similarly for 

// SN) . Thus the full key length could be: 

// 

// 64: Both GN and SN are 32 bits 



// 96: Gn is 64, bur SN is 32 

// 96: Gn is 32, but SN is 64 

// 128: Both GN and SN are 64 bits 

// 

// Keys are generated by name stem segment. The first key 
// consists of a key for the first GN segment, and a key 
// for the first SN segment. The second key 
// consist's of a key for the second GN segment, and a key 
// for .the second SN segment. When there are a differing number 
// of GN and SN segments, the final segment of the name 
// field with the fewer number of segments is repeated, . 
->IJ Thus, the number of keys generated is given by the formula: 
n" ' max (numCnSegs, numSnSegs) 

// 

7/ We do things this way so that a name has the same number of keys 
// for both GN and SN, and in fact we can view the two keys as one 
// contiguous key that can be passed to comparison functions as a 
// single value. 
// 

// Note that we are talking about stem segments (TAQ segments have 

// been removed) . 
// 

// maxKeys specifies how many keys the caller can fit into 

// keyBuff. It is up to the caller to make sure that they have 

allocated 

// enough space in the keyBuff to hold maxKeys . 

unsigned char NHNameData : : genlndexKeys ( int maxKeys, NHKeyWidth 
gnKeyWidth, 

NHKeyWidth snKeyWidth, void *keyBuff) 

{ 

int numKeysGenerated = 0; 
int gnSeglndex = 0; 
int snSeglndex = 0; 

unsigned int *keyPtr = (unsigned int *) keyBuff ; 

while {numKeysGenerated < maxKeys) { 

if ((gnSeglndex >= numCnSegments ) && (snSeglndex >= 
numSnSegments) ) 

break; 

else { . . . 

numKeysGenerated++ ; 

// make sure that if one segment is now at the end, 
// we stay on the last segment 

if (gnSeglndex == numGnSegments ) 

gnSeglndex — ; 
if (snSeglndex numSnSegments) 

snSeglndex — ; 

if (gnKeyWidth == NH_KEY_WIDTH_32) ( 
// gn key length is 32 
♦keyPtr « 

globalDigraphBitmapArray . get32BitKeyForToken (gnSegments (gnSeglndex) . segS 
tring) ; 

keyPtr++; ^ // move the pointer by 4 

bytes 

} 

else { 

// gn key length is 64 



giobalDigraphBitinapArray.get64BitKeyForToken (gnS 
egments [gnSeglndex] . segString, 



(bit_64_t *)keyPtr) ; 

keyPtr += 2; // move the pointer 

by 8 bytes 

} 

if (snKeyWidth == NH_KEY__WIDTH_32) ( 
// gn key length is 32 
-*<if-j.,. *keyPtr = 

globalDigraphBitmapArray . get32BitKeyForToken ( snSegments [ snSeglndex) . segS 

tring) ; 

keyPtr++; // move the pointer by 4 

bytes 

} 

else { 

// gn key length is 64 

globalDigraphBitmapArray . get64BitKeyForToken (snS 
egments [snSeglndex] .segString, 

(bit_64_t *) keyPtr) ; 

~ keyPtr += 2; // move the pointer 

by 8 bytes 

} 

// advance the segment indexes 

snSegIndex++; 

gnSegIndex++; 

} 

} 

return numKeysGenerated; 



// File: NHEvalNameData . cdd 
// 

// Description: 
// 

// Implementation to the NHEvalNameData class. 

// 

// 

// History: 
// 

// 5/14/97 EFB Created 

// 9/1/97 EFB Lots of changes to support 

retaining segment scores in 

// best mode so 

"t?h"at sorting can be more detailed and accurate 

// 10/31/97 EFB Made several member functions 

.protected, and made performComp ( ) 

// a friend of 

NHQueryNameData . Also changed performComp to 

// NOT delete 

objects that are not passed on to the resulcslist, 
// to 
accomodate the new method of deleting NHEvalNameData objects. 
// 11/03/97 EFB Added a new function, 

calcNameScore ( ) and made it virtual. 

// removed 
virtual from performComp. The perform comp method 

// was too 

complicated to be subclassed. We really only want 

// callers to 

be able to affect the name score and the determination 

// of 

HIT/NO_HIT. These are now the only virtual functions-. Both 

// are now 

inline in the header file so the caller knows exactly 

// what is 

happening in these functions if they decide to subclass 

// and ■ 

override. OOPS, I forgot compareScore ( ) , which is also 

// * virtual - we 

want them to be able to change how hits are sorted. 

// 

// 3/02/98 EFB . Made lots of changes necessary 

when I moved a bunch of 

// ' parameters 

(the ones associated with parsing the name) 

// from the 

NHCompParms class into a new class called NHNameParms. 

// and renamed 

the NHCompParms class to NHCompParms. 

// 3/20/98 EFB Changed names to NH from SN 



^include <string.h> 
#include <stdio.h> 
#include <stdlib.h> 



#include • "NHEvalNameData . hpp" 

#include "NHQueryNameData. hpp" 

#include "NH^util . hpp" 

# include "NH_queens__arrays . hpp" 



#include "NHVariantTable . hpp" 

#include "NHResultsList . hpp" 

#include "NHTAQTable . hpp" 

#include "NHName Farms . hpp" 



// private, non-member function prototype 

static double NH_digraph_score (char 'qSeg, int qSegLen, 

char *evalSeg, int evalSegLen, 
•""^ir^.., bool useLeftDigraphBias) ; 

static double NH_best_score (int numQSegs, int numEvalSegs, 

NHSegScoreMode scoreMode, 



scores [NH_MAX_SEGS_AFTER_TAQ1 [NH_MAX_SEGS_AFTER_TAQ] ) ; 

void NH_best_score_for_highest_mode (int xDim, int yDim, double 
highestScore, 

*bestSegScores, 

scores [NH_MAX_SEGS_AFTER_TAQ) (NH_MAX_SEGS_AFTER_TAQ] ) ; 



static double NH__calc_score ( 

t evalSegs, int numEvalSegs, 
tVariants querySegmentVariants, 



Farms *compParms, 
Farms *nameParms, 
Fields nameField, 
*origQNameField, 
*origEvalNameField, 
^nuiQSegsScoredr 
*bestSegScores) ; 



SegList qSegs, int nuraQSegs, 



* prima ryCu It ure, 
* seconder yCulture, 



double 

. double 
double 

SegLis 

SegLis 

char 

char 

NHComp 

NHName 

NHName 

char 

char 

int 

double 



static void NH_apply_TAQs_to_s core (double *diScore, Segment *qSeg, 

Segment *evalSeg, 

double absDelTAQFactor, 
double ab'sDisTAQFactor, 
double delTAQFactor, 
double disTAQFactor) ; 
static bool NH_check_compressed_name (char *qSegString, char 



*evalSegString, 

char *compressCharsPartl, 
char *compressCharsPart2) ; 



NHEvalNameData: :NHEvalNameData(NHNameParms *nParms, char *aGn, char 
*aSn) : 

NHNameData (nParms, aGn, 

aSn) 

resetScores { ) ; 

} . 



NHEvalNameData: .-NHEvalNameData (NHNameParms *nParms, char *aGn, char 
*aSn, char *aMn) : 

NHNameData (nParms, aGn, 

aSn, aMn) 
{ 

resetScores { ) ; 

} 



NHEvalNameData: : NHEvalNameData (NHNameParms *nParms, char *name, 
NHNameFormat nameFormat) : 

NHNameData (nParms, name 

nameFormat ) 
{ 

resetScores ( ) ; 

} 



// constuct an object from an archived representation in 
// a stream. 

// 

// The archive is in the following order 
// 

// gnLen 
// snLen 
// nameStorage 

NHEvalNameData: : NHEvalNameData (NHNameParms *nParms, istream fiinStream) 

NHNameData (nParms, 

inStream) 
{ 

// read the gn, sn and name scores 
.if (inStream) 

inStream. read ( (char *)&gnScore, sizeof (gnScore) ) ; 
if (inStream) 

inStream. read { (char *)&snScore, sizeof (snScore) ) ; 
if (inStream) 

inStream. read ( (char * ) finameScore, sizeof (nameScore) ) ; 

// seg differentials 
if (inStream) 

inStream. read ( (char *) &gnSeg'Differential, 
sizeof (gnSegDifferential) ) ; 
if (inStream) 

inStream. read ( (char *) fisnSegDif ferential, 
sizeof (snSegDifferential) ) ; 



// read the number of gn segs scored, and however many scores 
we need inStream, read{ (char * ) inumGnSegsScored, 

sizeof (numGnSegsScored) ) ; 
if (inStream) 

inStream. read ( (char * ) inumGnSegsScored, 
sizeof (numGnSegsScored) ) ; 
if (inStream) { 

if (numGnSegsScored > 0) { 

inStream. read { (char * ) gnSegScores, numGnSegsScored * 

sizeof (double) ) ; 

-^v^ ) 

} 

// read the number of sn segs scored, and however many scores 
we need 

if (inStream) 

inStream. read ( (char * ) &numSnSegsScored, 
sizeof (numSnSegsScored) ) ; 
if (inStream) { 

if (numSnSegsScored > 0) { . 

inStream. read ( (char * ) snSegScores, numSnSegsScored * 

sizeof (double) ) ; 

} 

} 

) 



NHEvalNameData : : -NHEvalNameData ( ) 

{ 

) 



bool NHEvalNameData :: archiveData (ostream soutStream) 
{ 

bool rc = true; 



rc = NHNameData :: archiveData (outStream) ; 
if (rc) { 

// read the gn, sn and name scores 

outStream . write ( (char *)&gnScore, sizeof (gnScore) ) ; 

outStream . write ( (char *)&snScore, sizeof (snScore) ) ; 

outStream. write ( (char * ) finameScore, sizeof (riameScore) ) ; 

// seg differentials 

outStream. write ( (char * ) fignSegDif ferential, 
sizeof (gnSegDif f erential ) ) ; 

outStream. write ( (char * ) isnSegDiff erential, 
sizeof (snSegDifferential) ) ; 

// read the number of gn segs scored, and however many 
scores we need inStream. read ( (char * ) &numGnSegsScored, 
sizeof (numGnSegsScored) ) ; 

outStream. write ( (char * ) ^numGnSegsScored, 
sizeof (numGnSegsScored) ) ; 

if (numGnSegsScored > 0) / { 

outStream. write ( (char * ) gnSegScores, numGnSegsScored * 

sizeof (double) ) ; 

) 



// read the number of sn segs scored, and however many 
scores we need 

outStream. write ( (char * ) &numSnSegsScored, 
sizeof (numSnSegsScored) ) ; 

if {numSnSegsScored > 0) { 

outStream. write { {char *) snSegScores, numSnSegsScored * 

sizeof (double) ) ; 



} 



) 



return rc; 



// note that this function is a friend of NHQueryNameData, which is 
// why we are able to access private member functions of that class, 
void inline NHEvalNameData : : calcComponentScores (NHQueryNameData 
*queryName) 



{ 

char 

>primaryCultureCode; 
char 

>secondaryCultureCode ; 



*primaryCulture = nameParms- 
*secondaryCulture = nameParms- 



// do the digraph compare and set the scores 
gnScore = NH_calc_score (queryName->gnSegments, queryName- 
>numGnSegments, 

ents, numGnSegments, 

ame->gnSegmentVariants , 

yCulture, secondaryCulture, 

rms, 

rms, 

ST_NAME, 

ame->gn, gn, 
SegsScored, 
cores) ; 

snScore = NH_calc_score (queryName->snSegments, queryName- 
>numSnSegments, 

ents, numSnSegments, 

ame->snSegmentVariants, 

yCulture, secondaryCulture, 

rms, 

rms, / 

T_NAME, 

ame->sn, sn. 



gnSegm 
quer^N 
primar 
compPa 
name Pa 
NH_FIR 
queryN 
^numGn 
gnSegS 

snSegm 
queryN 
primar 
compPa 
name Pa 
NH_LAS 
queryN 



&numSn 

SegsSpored, 

snSegS 

cores); 



// note that this function is a friend of NHQueryNameData, which is 
// why we are able to access private member functions of that class. 
NHReturnCode NHEvalNameData : : perf ormComp (NHQueryNameData 

*queryName, 



■■'iu---^: NHCompParms 
*someCompParms ) 
( 

NHReturnCode compResult; 
NHResuitsList *resuitList; 



// save the compParms so that they can be easily referenced 
// throughout the comparison process. 
compParms = someCompParms ; 

calcComponent Scores (queryName) ; 

// call a method to calculate the name score. 
calcNameScore ( ) ; 

// store the segments differentials, in case we get a tie 
score. ' 

gnSegDif ferential = abs (numGnSegments - queryName- 
>getNumGnSegments ( ) ) ; 

snSegDif ferential = abs (numSnSegments - queryName- 
>getNumSnSegments ( ) ) ; 

// Now call the getCompResult { ) function to get the return 

value 

// (i.e. was it a match?) 
compResult = getCompResult () ; 

// now see if we are working with a results list 
resultList = queryName->getResultsList ( ) ; 
if (resultList != NULL) { - 

// we are using a result list. If this is a hit, add it 

// to the result list. 

// Otherwise, delete it 

if (compResult == NH^MATCH) { 

NHReturnCode tempInsertResult ; 

// make sure the insert works. If so, don't mess 

// the compResult, so the comparison will be 

// as a hit. If there was an error, delete this 

// and save the error code so it can be returned. 
tempInsertResult = resultList->addHit (this) ; 
if (tempInsertResult NH__SUCCESS) ( 
compResult = tempInsertResult; 

} 

} 



with 

returned 
object, 



} 

return comoResult; 



// used only when the segment mode is set to HIGHEST. 

// It compares the segment scores the were retained when 

// the name was compared to the query name. 

// We are comparing the segment scores for two (pre-scored) 

// eval names. The comparison should find which name has 

// the "best" set of segment scores, where best is defined 

I i^^ as "the one with the highest best score". If the best 

// *^"''"''score results in a tie, we move on to the second best score, 

// and so on until we find a difference, or there are no more 

// segments to compare. Each name has variables numGnSegsScored 

// and numSnSegsScored, that tell how many segments were scored 

// in the name. We do up to N comparisons, where N is the larger 

// of the number of segments scored in each name. Where one name 

// has less segments scored than the other, a default value of 

// NH_DEFAULT_MISSING_SEGMENT_SCORE is assigned. This is so that 

// a scored segment has to beat some threshold to be considered 

// better than nothing at all. 

// 

double NH Eval Name Data : : compareSegmentScores (NHEvalNameData 

*scoredName, NHNameFields nameField) 

{ 

double scoreDiff; 

int maxComparisons ; 

double *thisEvalScores; 

double *compEvalScores ; 

int numSegsScoredForThisEval; 

int numSegsScoredForCompEval; 



if (nameField NH_LAST_NAME) { 
thisEvalScores = snSegScores; 
compEvalScores - scoredName->snSegScores ; 
numSegsScoredForThisEval = numSnSegsScored; 
. numSegsScoredForCompEval = scoredName->numSnSegsScored; 

} 

else { 

thisEvalScores = gnSegScores; 
compEvalScores = scoredName->gnSegScores; 
numSegsScoredForThisEval - numGnSegsScored; 
numSegsScoredForCompEval « scoredName->numGnSegsScored; 

) 

maxComparisons = numSegsScoredForThisEval > 
numSegsScoredForCompEval ? numSegsScoredForThisEval : 
numSegsScoredForCompEval; 

for (int i = 0; i < maxComparisons; i++) { 
if (i >= numSegsScoredForThisEval) 

thisEvalScores [i] = NH_DEFAULT_MISSING_SEGMENT_SCORE; 
else // we can do an else because only one segment 

can be missing, not both 

if (i >= numSegsScoredForCompEval) 
compEvalScores ( ij * 
NH DEFAULT MISSING SEGMENT SCORE; ' 



scoreDiff « compEvalScores [i] - thisEvalScores (i ) ; 
if (scoreDiff !« 0) 



break; 



} 



* * * 



return scoreDiff; 



/* NH_calc_score 

Performs a string comparison on two name fields. 
Returns a value between 0.00 and 
• ^"^V^iOO, with 1.00 being an exact- fit 



double NH_calc_score{ SegList qSegs, int numQSegs, 
t evalSegs, int numEvalSegs, 
tVariants querySegmentVariants, 

•primaryCulture, 



*secondaryCulture, 



Farms *comp Farms, 

Farms *nameParms, 

Fields nameField, 

* or igQName Field, 

*origEvalNameField, 

*numSegsScored, 

*bestSegScores) 
{ 

NHAnchorSegMode 
NHSegScoreMode 
double 
double 
double ■ 
double 
double 
-bool 
double 
double 

re; 

bool 

// double 

bool 
double 
double 
double 
double 

TAQl (NH_MAX_SEGS_AFTER_TAQ] i 
int 



anchorSeg; 
scoreMode; 



SegLis 
SegLis 
char 
char 
NHComp 
. NHName 

NHName 

char 

char 
■ int 

double 



// 



// temp index for query segments 



oops Factor; 
absDelTAQFactor;. 
absDisTAQFactor; 
delTAQFactor; 
disTAQFactor; 
. matchlnit; 
initScore; 

initialOnlnitialMatchSco 

checkVariant ; 

variantScore; 
leftDigraphBias; 
anchorFactor; 
nameUnknownScore ; 
noNameScore; 
scoresTable (NH_MAX_SEGS_ArrER_ 
scores for segment pairs 
qlndex; 



■ . ^ , eval Index; // 

temp index for eval segments 

t^}^ , qSegLen; 

// hold string length of query segment 

u , . evalSegLen; // 

hold string length of eval segment 

double diScore ^ 

// temp score for single pair comoarison 

double hiScore = 

// temp score to hold best score as we iterate, 

// which lets us avoid 

^best_score in mode=BEST 

Tr areVariants; 
// temp flag to hold if the pair are variants 

returnValue = 0.0; 
■ NHVariantTable *variantTabie; 

varScore; 



NHVarld 

bool 

double 

bool 



evaiSegVarld 

scoreTaqs; 

compressedNameScore ; 
checkCompressedName; 



/ / set some paramters based on the name field 
if (name Field == NH_LAST_NAME) { 

anchorSeg = compParms->getSnAnchorSegmentMode ( ) ; 
scoreMode = compParms->getSnSegmentScoreMode ( ) ; 
oopsFactor = compParms->getSnOOPSFactor ( ) ; 
matchlnit = compParms->getMatchSnIntial { ) ; 
initScore = compParms->getSnInitiaiScore ( ) ; 

initialOnlnitialMatchScore = compParms- 
>getSnInitialOnInitialMatchScore ( ) ; 

checkVariant = compParms->getUseSnVariant s ( ) ; 

anchorFactor = compParms->getSnAnchorFactor ( ) ; 

leftDigraphBias = compParms->getUseSnLef tBias ( ) ; 

nameUnknownScore = compParms->getLNUScore ( ) ; 

noNameScore = compParms->getNLNScore ( ) ; 

scoreTaqs = compParms->getScoreSnTAQs () ; 

absDelTAQFactor = compParms->getAbsDelSnTAQFactor ( ) ; 

absDisTAQFactor = compParms->getAbsDisSnTAQFactor ( ) ; 

delTAQFactor = compParms->getDelSnTAQFactor ( ) ; - 

disTAQFactor compParms->getDisSnTAQFactor ( ) ; 

compressedNameScore = compParms->getSnCompressedNameScore { ) ; 

CheckCompressedName = compParms->getCheckSnCompressedName ( ) ; 
^ variantTable = nameParms->snVariantTable; 

else { 

anchorSeg = compParms->getGnAnchorSegmentMode ( ) ; 
scoreMode = compParms->getGnSegmentScoreMode ( ) ; 
oopsFactor = compParms->getGnOOPSFactor { ) ; 
matchlnit = compParms->getMatchGnIntial { ) ; 
initScore = compParms->getGnInitialScore { ) ; 

initialOnlnitialMatchScore = compParms- 
>getGn-InitialOnInitialMatchScore ( ) ; 

checkVariant = compParms->getUseGnVariants ( ) ; 

anchorFactor = compParms -,>getGnAnchor Factor () ; 

leftDigraphBias = compParms->getUseGnLef tBia^ ( ) ; 

nameUnknownScore « compParms->getFNUScore ( ) ; 

noNameScore » compParms->getNFNScore ( ) ; 



scoreTaqs = compParms->getScoreGnTAQs ( ) ; 
absDelTAQFactor = compParms->get AbsDelGnTAQFactor () ;■ 
absDisTAQFactor = compParms->getAbsDisGnTAQFactor ( ) ; - ■ 

delTAQFactor = compParms->getDelGnTAQFactor { ) ; 
disTAQFactor = compParms->getDisGnTAQFactor { ) ; 
compressedNameScore * compParms->getGnConipressedNameScore { ) ; 
checkCompressedName = compParms->getCheckGnCompressedName ( ) ; 
variantTable = namePanns->gnVariantTable; 



// clear out the scores table 
for {qlndex = 0; qlndex < NH_MAX_SEGS_AFTER_TAQ; ++qlndex) 

for (evallndex = 0; evallndex < NH_MAX_SEGS_AFTER_TAQ; ++evai Index) 
-»^Qr-,-scoresTable [qlndex] (evallndex] = 0.0; 

// now go through each possible combination of segment pairs 
// (created by matching a query segment against an eval 
segment) . 

// Store the scores 'in the scoresTable. 
for (qlndex = 0; qlndex < numQSegs; ++qlndex) { 

qSegLen = strlen (qSegs [qlndex] . segString) ; 

for {evallndex = 0; evallndex < numEvalSegs; ++evallndex) { 
evalSegLen = strlen (evalSegs [evallndex] . segString) ; 

// first check for either the query or eval segment 



being 



scores 
Known - K, 



// blank, 
if ( (qSegLen 

// 

// 



0) I i (evalSegLen 



0) ) { 



unknownScore 



// 

// 
// 
// 



// 
// 
// 



We make a distinction between "unknown" 
and "none". The table below shows the 

we assign for the various combinations of 

Unknown - U, and None -N. ■ 

I K 



N/A 



NoneScore 



// U I unknownScore I 

e + 1) / 2 I (unknownScore + 1) / 2 

// 



(unknownScor 



wnScoxe +1) / 2 t 



// N I NoneScore 

(NoneScore + 1) / 2 

// 



(unkno 



if (qSegs [qlndex] . status == 
NH_NAME_FIELD_STATUS_KNOWN) { ' 

// we should not need to check for both 

being known 



// File: NHQueryNameData.cpp 
// 

// Description: 
// 

// Implementation to the NHQueryNameData class. 

// 

// 

// History: 
// 

// 5/14/97 EFB Created 

// 3/20/98 EFB Changed names to NH from SN 



#include <string.h> 
#include <stdio.h> 



#include "NHQueryNameData . hpp" 

#include "NHVariantTable . hpp" 

#inciude "NHResultsList . hpp" 

#include "NH_util . hpp" 

#include "NHDigraphBitmapArray . hpp" 

#include "NHNameParms . hpp" 



extern NHDigraphBitmapArray globalDigraphBitmapArray; 

#define NH INDEX THRESH 0.5 



NHQueryNameData: : NHQueryNameData (NHNameParms *nParms, char *aGn, char 
*aSn). : 

NHNameData (nParms, aGn, 

aSn) 

I - . 

resultsList = NULL; 
keysArray = NULL; 
numSitsInGnKeys = NULL; 
numBitsInSnKeys = NULL; 

processVariant Values (nParms->gnVariantTable, 

nParms->snVariantTable) ; 
} 



NHQueryNameData :: NHQueryNameData (NHNameParms *nParms, char *aGn, char 
*aSn, char *aMn.) 

NHNameData (nParms, aGn, 

aSn, aMn) 

{ 

resultsList = NULL; 
keysArray « NULL; 
numBitsInGnKeys = NULL; 
numBitsInSnKeys = NULL; 

processVariant Values (nParms->gnVa^iant Table, 

nParms->snVariantTable) ; 
} 



NHQueryNameData : : HHQueryNameData (NHNameParms *nParms, char *name, 
NHName Format nameFormat ) : 

NHNameData {nParms, name, 

nameFormat) 
{ 

resultsList = NULL; 
keysArray = NULL; 
numBitsInGnKeys = NULL; 
numBitsInSnKeys = NULL; 

processVariant Values {nParms->gnVariantTable, 

nPa~rnfs^>snVariantTable) ; 
} 



NHQueryNameData : : -NHQueryNameData ( ) 
{ 

if (keysArray != NULL) 

delete [] keysArray; 

if (numBitsInGnKeys != NULL) 

delete [] numBitsInGnKeys; 

if (numBitsInSnKeys != NULL) 

delete {] numBitsInSnKeys; 



// Function to get a pointer to a NHVariant object for each name 
// segment. We do this here, in the query 

// name, so that lookups only have to be done once for the query 
name . 

// Note also that we check first to make sure that we are supposed to 
be 

// using variants (we do this per name field) . 

void NHQueryNameData': : processVariantValues (NHVariantTable 

*gnVariantTable, 

NHVariantTab 

le *snVariantTable) 

{ ■ ' 

int i; 



if (nameParms->getUseGnVariants { ) ) { 

for (i = 0; i < numGnSegment s ; i++) 

gnSegmentVariants [ i] = gnVariantTable- 
>getVariantObjectForName (gnSegments { i] . segString) ; 

. } .... 

if (nameParms->getUseSnVariants ( ) ) { 

for (i = 0; i < numSnSegments; i++) 

snSegmentVariants [ i] = snVariantTable- 
>getVariantObjectForName (snSegments [i] .segString) ; 
} 



// function to allocate space for, ind generate, the keys for 

// this query name. The caller calls this explicitly with the 

// desired key widths for the GN and SN. We use these 

// values in conjunction with the numGnSegments and numSnSegments 



// .to calculate how big to make the array that will hold the keys, 
void NHQueryNameData : iprepareKeys (NKKeyWidth gnKeyWidth, 



NHKeyWidth snKeyWidth) 



int keyArraySize; 

unsigned char largerNumberOf Segments ; 

int fuilKeyLen; 



// first allocate the keys 

if (numSnSegments > numGnSegments ) 

largerNumberOf Segments = numSnSegments; 

else 

largerNumberOf Segments = numGnSegments; 
if (gnKeyWidth == NH_KEY_WIDTH_32 ) { 

if (snKeyWidth == NH_KEY_WIDTH_32) 
fuilKeyLen = 64; • 

else 

fuilKeyLen * 96; 

) 

else { 

if (snKeyWidth == NH_KEY_WIDTH_32) 
fuilKeyLen - 96; 

else 

fuilKeyLen = 128; 

} 

keyArraySize = largerNumberOf Segments * fuilKeyLen; 
keysArray » new unsigned int ( keyArraySize] ; 

// save the key lengths 
queryGnKeyWidth = gnKeyWidth; 
querySnKeyWidth = snKeyWidth; 

// now generate the keys for the query 
. numBitmapKeys « genlndexKeys ( largerNumberOf Segments, gnKeyWidth, 

snKeyWidth, keysArray) ; 

// now allocate space for the arrays that hold the number of 
// bits turned on for each key in the GN and SN. 
numBitsInGnKeys = new unsigned char [largerNumberOf Segments] ; 
numBitsInSnKeys = new unsigned char [largerNumberOf Segments] ;. 

unsigned char *keysArrayBytePtr = (unsigned char *") keysArray; 
for (int i = 0; i < numBitmapKeys; i++) { 
if (gnKeyWidth NH_KEy__WIDTH_32) { 

// the number of bits^turned on is the sum of the 

number of bits 

// in each of the 4 bytes that make up the 32 bit 

value 

numBitsInGnKeys [ i] = 
globalDigraphBitmapArray .getNumBitsForByte (* (keysArrayBytePtr++) ) + 

globalDigraphBitmapArray . getNumBitsForByte ( * ( keysArrayBytePt 

r++)) + 

globalDigraphBitmapArray . getNumBitsForByte { * ( keysArrayBytePt 

r++)) + 

. globalDigraphBitmapArray . getNumBitsForByte ( * { keysArrayBytePt 



r++)); 

} ' . 

else { 

// the number of bits turned on is the sum of the 

number of bits 

// in each of the 8~*byte*s' that make up the 64 bit 

value 

numBitsInGnKeys [ i] = 
globalDigraphBitmapArray.getNumBitsForByte(* {keysArrayBytePtr++) ) + 



r++) ). + 
r++)T-+ 
r++)) + 
r++)) + 
r++)) + 
r++)) + 
r++) ) ; 



globalDigraphSitmapArray . getNumBitsForSyte ( * ( keysArrayByt ePt 
globalDigraphBitmapArray , getNumSitsForByce ( * ( keysArrayBytePt 
globalDigraphBitmapArray. getNumBitsForSyte (* (keysArrayBytePt 
globalDigraphBitmapArray . getNumBitsForByte { * { keysArrayBytePt 
globalDigraphBitmapArray . getNumBitsForByte (* ( keysArrayBytePt 
globalDigraphBitmapArray . getNumBitsForByte ( * ( keysArrayBytePt 
globalDigraphBitmapArray . getNumBitsForByte { * { keysArrayBytePt 



// now do the surname 

if (snKeyWidth == NH_KEY__WIDTH_32 ) { 

// the number of bits turned on is the sum of the 

number of bits 

// in each of the 4 bytes that make up the 32 bit 

value 

numBitsInSnKeys [ i] = 
globalDigraphBitmapArray. getNumBitsForByte (* ( keysArrayBytePtr++) ) + 

globalDigraphBitmapArray . getNumBitsForByte { * ( keysArrayBytePt 
r++)) + • 

globalDigraphBitmapArray .getNumBitsForByte ( * { keysArrayBytePt 

r++)) + 

globalDigraphBitmapArray . getNumBitsForByte ( * ( keysArrayBytePt 

r++));. 

) 

else { 

// the number of bits turned on is the sum of the 

number of bits 

// in each of the 8 bytes that make up the 64 bit 

value 

numBitsInSnKeys { i ] = 
globalDigraphBitmapArray . getNumBitsForByte (* ( keysArrayBytePtr++ ) ) + 

globalDigraphBitmapArray . getNumBitsForByte ( * ( keysArrayBytePt 

r++)) + 



globalDigraphEitmapArray.getNumBitsForByte { " ( keysArrayBytePr 

r++)) + 

globalDigraphBitmapArray . getNurtiBitsForByte ( * ( keysArrayBytePt 

r++)) + 

globalDigraphBitmapArray .getNuraBitsForByte ( • { keysArrayBytePt 

r++)) + 

globalDigraphBitmapArray . getNumBitsForByte (* ( keysArrayBytePt 

r++}) + 

globalDigraphBitmapArray . getNumBitsForByte {* (keysArrayBytePt 

r++)) + 

^'"^^^ globalDigraphBitmapArray. getNumBitsForByte (* (keysArrayBytePt 

r++)); ■ 

} 

} 

) 



#ciefine NH_EITHER_NH_OR_GN 1 

#define NH_BOTH_NH_AND_GN 2 

// function to compare the key(s> for this query name against 

// a supplied key from an eval name. Before this function is 

// called, the caller must have called the 

// perpareKeys ( ) method, which sets the gnKeyLength and 

// shKeyLength variables, and generates the keys for this 

// query name. 

// The comparison is performed by looking at the givename name 

// and surname portions of the key separately. For each of these 

// subkeys, we see how many bits match, a calculate the quotient of 

// matching bits / bits that could have matched. This score is 

// compared to ???. If the score for either the GN or SN comparison 

// is favorable, the function returns true to indicate that the 

// evaluation name associated with the supplied key is a possible 

// match, and should be retrieved for further consideration. 

// Since this object (the'query) could generate multiple keys, 

// we may have to perform several comparisons. 

bool NHQueryNameData :: compareKey (unsigned int *evalBitMapKey, unsigned 

char numEvalKeys) 

{ 



bool 




rc - false; 


unsigned 


int 


*evalKeyPtr; 


unsigned 


int 


*queryKeyPtr; 


unsigned 


int 


*masterQueryKeyPtr - keysArray; 


unsigned 


int 


maskedVal; 


unsigned. 


char 


numBitsThatMatched; 


unsigned 


char 


*bytePtr; 


bool 




passedGn = false; 


bool 




passedSn = false; 


int 




indexMode = 



NH_BOTH_NH_AND_GN ; 

// for each of the query's keys, do both a SN and GN comparison 
// out nested loop compares the first GN and SN query key to 
// all the eval keys (inner lo'op) , and then moves on to the 



next 

// query key {outter loop) . 

for (int i = 0; (i < numBitmapKeys) && (rc == false); i++) { 

evalKeyPtr = evalBitMapKey; // start the 

eval ptr at the beggining 

for {int j = 0; j < ( int ) numEvalKeys ; { 



the 

after we have 



// place the queryKeyPtr back to the beggining of " 

// current query Icey. This value gets advanced 

// compared the current query key to all eval keys 
queryKeyPtr = masterQueryKeyPtr ; 



// first, check the given name 

if (queryGnKeyWidth == NH_KEY_WIDTH_32 ) { 

// just compare a 32 bit key for the gn 

maskedVal = *evalKeyPtr & *queryKeyPtr ; 

bytePtr = (unsigned char * ) &maskedVal ; - 

numSitsThatMatched = 
globalDigraphBitmapArray.getNuraBitsForSyte (* (bytePtr+ + ) ) .+ 

globalDigraphBitmapArray.getNumBitsForByte (* (bytePtr++ 

)) + 

globalDigraphBitmapArray . getNumBitsForByte ( * (bytePtr++ 

)) + 

globalDigraphBitmapArray. getNumBitsForByte (* (bytePtr++ 

) ) ; ' 

if ( (double) numBitsThatMatched / 
(double) numBitsInGnKeys[i} > NH_INDEX_THRESH) { 

if {indexMode 

NH_EITHER_NH_OR_GN) { 

rc = true- 
break; 

) 

else { 

// looking for both, is SN already set? 

if 

(passedSn) { // yes, so we matched both 

rc - true; 
break; 

} 

else 

// no, just set the gn flag 

passedGn - true; 

} 

} 

evalKeyPtr++ ; // advance pointers 

queryKeyPtr++;. 

) 

else { 

// just compare a 64 bit key for the gn 
maskedVal » *evalKeyPtr & *queryKeyPtr ; 
bytePtr = (unsigned char *) imaskedVal; 
numBitsThatMatched » 
globalDigraphBitmapArray. getNumBitsForByte (* (bytePtr++) ) + 

globalDigraphBitmapArray. getNumBitsForByte (* (bytePtr++ 



)) + 

globalDigraphBitmapArray. getNumBitsForByte (* (bytePtr++ 

)) + 

globalDigraphBitmapArray. getNumBitsForByte (* (bytePtr++ 

)); 

evalKeyPtr++; // advance pointers to get 

to second 32 bits in this 64 bit key 

queryKeyPtr++; 

maskedVal = *evalKeyPtr & *queryKeyPtr ; 
bytePtr = (unsigned char * ) SmaskedVal ; 

numBitsThatMatched += 
gloHal'DigraphBitmapArray. getNumBitsForByte (* (bytePtr++) ) + 

globalDigraphBitmapArray . getNumBitsForByte (■* (bytePtr++ 

) ) + 

globalDigraphBitmapArray . getNumBitsForByte ( * (bytePtr++ 

)) + 

globalDigraphBitmapArray. getNumBitsForByte ( * (bytePtr++ 

) ) ; 

if ( (double) numBitsThatMatched / 
(double) numBitsInGnKeys [i] > NH__INDEX_THRESH) t 

if (indexMode -= 

NH_EITHER_NH_OR_GN) { 

rc = true; 
break; 

} 

else ( 

// looking for both, is SN already set? 

if 

(passedSn) { // yes, so we matched both 

rc = true; 
break; 

} 

else 

// no, just set the gn flag 

passedGn «= true; 

) 

} 

evalKeyPtr++^' // advance pointers 

queryKeyPtr++; 

} 

// • now, check the surname 

if (querySnKeyWidth == NH_KEY_WIDTH__32 ) { 

// just compare a 32 bit key for the sn 
maskedVal = *evalKeyPtr & *queryKeyPtr ; 
bytePtr = (unsigned char * ) imaskedVal; 
numBitsThatMatched = 
globalDigraphBitmapArray, getNumBitsForByte (* (bytePtr++) ) + 

globalDigraphBitmapArray .getNumBitsForByte (* (bytePtr++ 

)) + 

globalDigraphBitmapA]5ray. getNumBitsForByte ( * (bytePtr+ + 

)) + 



NH_NAME_FIELD_STATUS_NON_EXISTANT; 

} else if { ! strcmp (gnSegmencs ( i 1 . segStrinq, 

"MNU") ) ( 

gnSegments [i] .segString [01 = EOS; 
gnSegments [i] .status = 
NH_NAME_FIELD_STATUS_UNKNOWN; 

1 

) 

} 

// now the sn segs 

if (numSnSegments ==0) { 

-•x^^At numSnSegments = 1; 

*^ " snSegments [0] . segString = ""; 

snSegmentsfO] .status = NH_.NAME_FIELD_STATUS_UNKNOWN; 

} 

else if {nameParms->getCheckSnUnknowns ( ) ) { 

for (i = 0; i < numSnSegments; i++} { 

if (! strcmp (snSegments [i] . segString, "NLN"}){ 
snSegmentsfij .segString[0] = EOS; 
snSegments [i] . status = 
NH_NAME_FIELD_STATUS_NON_EXISTANT; 

) else if ( Istrcmp (snSegments (i] .segString, 

"LNU")) { 

snSegments [i] . segString [0] = EOS; 
snSegments ( i] . status ~ 
NH_NAME_FI ELD_STATUS_UNKNOWN ; 

} 

} 

) 

) 



// function to go through the segments and for each one, see if 

// it is a TAQ value. If so, we associate the TAQ with the previous 

// or following segment, depending on its type (i.e. prefix, suffix, 

etc) . 

// When we store the TAQ, we also store the action associated with 

// the TAQ (currently DELETE or DISREGARD), since this information 

// will be required to determine how to adjust the base segment score 

// 

// Deciding which segment to associate a TAQ with can get pretty 

// hairy, especially when mulitple TAQs can be in a name field 

// consecutively. We use the Following rules for single TAQ values: 

// 

// TAQ Type Segment to Associate with 
// 

// Prefix next segment 

// Suffix previous segment 

// Infix Not supported yet 

// Title next segment 

// Qualifier previous segment 

// 

// These are the basic rules for figuring out which segment to 

associate 

// TAQs with: 

// • 

// - Any TAQ segments before the first Name segment are 
associated with 

// the first name segment 



// 

// - Any TAQ segments after the last Name segment are associated 
with 

// the last Name segment 

// 

// - For TAQs that are surrounded by Name segments : 

// 

// - All TAQs between a Name segment (on the left) and a 

suffix (qualifier) 

// (on the right) are associated with the Name Segment, 

// 

// ^ - All TAQs not fitting the above are assoicated with the 

Name^^gment 

// they proceed. 

// . 

void NHNameData : rprocessTAQValues (NHTAQTable *taqTable) 
{ 

// NHTAQAction taqAction; 

int i ; 

NH_TAQRecordPtr tempTAQList [NH_MAX_TAQS_PER_SEGMENT] ; 
// temp list of TAQs found 

int tempTAQSeglndex; // 

temp index for the tempTaqList 

NH_TAQRecordPtr tempTAQRecordPtr; // pointer to structure for 
a TAQ record 

int numTempTAQSegs; 
// how many TAQs did we find 

int seglndex; 
// which segment are we looking at 

int lastPref ixindex; // 

index of last prefix like segment we got 

int .. lastSuf fixindex; // 

index of last suffix like segment we got 

int lastNamelndex; 
// index of last non-TAQ segment we got 

int nameSegmentTaqListlndex; 
// where to put taqs in a name segments taq list 

char *primaryCultureCode * 

nameParms->primaryCultureCode; 

char *secondaryCultureCode = 

nameParms->secondaryCultureCode; 

// clear out the TAQ counts for each segment. 
// This is important because the TAQ segments are not 
initalized 

// if they are not filled in. 
for {i « 0; i < numGnSegments; i++) 
gnSegments [i] . numTAQs = 0; 

if {nameParms->getSeparateGnTaqs ( ) true) { 
// init some variables 
seglndex = 0; 
numTempTAQSegs « 0; 



field. 



// Start out by looking for TAQs at the start of the name 
// before any name segments. 

// while there are TAQ values at the start of the gn 
// get their associated TAQ record and place that in 
// a temporary list. . . 

while (seglndex < numGnSegments) { 



tempTAQRecordPtr = taqTable- 
>getTAQSegment (gnSegments [seglndex] . segString, 



primaryCultureCode, — 



secondaryCultureCode) ; 

if (tempTAQRecordPtr != NULL) { 

// make sure we are not past our space for 

TAQs in the temp list 

// This would happen if a name field started 

out with tons of TAQs 

-*<ir^.^ . if (seglndex < NH_MAX_TAQS_PER_SEGMENT) ■ { 

tempTAQList [numTempTAQSegs] = 

t empTAQRecordPt r ; 

numTempTAQSegs ++ ; 

.} 

seglndex++; 

} 

else 

break; 

} 

// ■ as long as we found a non-TAQ segment 
if (seglndex < numGnSegments ) { 

// fill up the taqList for the first Name Segment 

with 

// each of the leading TAQs we found. If we found 

no TAQs above, 

// numTempTAQSegs will be 0, so we wont even enter 

into the loop. 

// Also, since we resticted the loop above, we are 
guaranteed to . , 

// not exceed our space for TAQs for a single 

segment . 

for (i = 0; i < numTempTAQSegs; i++) { 

gnSegments [seglndex] .taqList [i] .segString = 

gnSegments [i] .segString; 

gnSegments [seglndex] .taqList [i] .taqAction = 
tempTAQList [i] ->gnAction; 

gnSegments [seglndex] .taqList [i] .taqType » 

t empTAQLi st[i3->taqT ype ; 

gnSegments [seglndex) .numTAQs += 1; 

} 

// now move all the segments back starting with 

first name segment 

// ousting the leading TAQs. If we found that the 

first segment 

// was a. name segment, we do not need to move 

anything. 

if (seglndex != 0) { 

for (i = seglndex; i < numGnSegments; 

i++) { 

gnSegments [i - seglndex] = gnSegments [ i ] ; 

} 

// note that we now have less segments, since 
we removed some segments ' 

// that were TAQ values 
numGnSegments -» seglndex; 



// 



now back at the begining 



also, set the seglndex to 0, since we are 



} 



seglndex « 0; 



// now start looking at the remaining segments 
// along the way, we must keep track of 
// - the index of the last Name segment 

we found (start out as 0, since we backed it up to 0} 

// - the index of the last "suffix-like" 

(starts out as -1, since all TAQs were tacked onto seg 



TAQ we found 

TAQ we found 
0) 



// - the index of the last "prefix-like" 

(starts out as -1, since all TAQs were tacked onto seg 



// 
// 
// 
// 



If we get a: 
Name : 



lastNamelndex + 1 and the 
// 

gnSegment [lastNamelndex] ; 

// 

the lastPref ixindex and 
// 

segment . 

// 

the TAQ values from the gnSegment array 

// 

(lastNamelndex = seglndex;) 

// 

many TAQs we ousted 
// 
// 

1 // 
// 
// 
// 

seglndex 

// 

// 



associate everything between the 
lastSuff ixindex with 
associate everything between 
seglndex - 1 with this name 
move everything back to oust 
mark the new lastNamelndex 
adjust numGnSegments for how 



"Suffix Like" 

lastPref ixindex = - 
previous, prefix now considered a suffix 
lastSuff ixindex = seglndex 
"Prefix Like" 

lastPrefixIndex -» 



End of Segments 

- associate everything between the 
lastNamelndex + 1 and seglndex 

// with gnSegment [lastNamelndex] ; 

// - adjust numGnSegments for how 

many TAQs we had at end 
// 

// Note that we do not do any storing of anything 
until we either reach the 

// end of the sements, or get a non-taq segment. 

// 

// Also, as we read TAQ segments, we store a 
pointer to their retrieved 

// structure in a list. We do this because we must 

read ahead before 

// we can store a TAQs relevant info (type, action) 
as being associated 

// with a segment,' and we do not want to have to 
look up the TAQ info twice. 



numTempTAQSegs = 0; 

last Pref ixindex = -1; 
lastSuf f ixindex = -1; 
lastNamelndex = seglndex; 

seglndex++; // look at the next segment 

while (seglndex < numGnSegments ) { 
terapTAQRecordPtr = taqTable- 
>getTAQSegment (gnSegmehts f seglndex] . segString, 

primaryCultureCode, 

secondaryCultureCode) ; 

if (tempTAQRecordPtr == NULL) { 

// segment is not a TAQ value 

// do an initial check to make sure we 

actually got one or more TAQs. 

// if not, all we really have to do is 
just reflect the new value for . . 

// lastNamelndex. 

if {numTempTAQSegs > 0) { 

// so associate all tags between 

the previous Name segment and 

// the last suffix with the 
previous Name Segment. Since lastSuff ixindex 

// may be -1 (if there we not 
suffixes) , we may not even enter this for loop. 

' // this variable is necessary 

because the segment at lastNamelndex 

// might already have TAQs stored 

in its taqList (due to prefixes) . 

// We must keep track of where 
the next available place in that list is. 

nameSegmentTaqListlndex - 

gnSegments [lastNamelndex] .numTAQs; 

tempTAQSeglndex = 0; 
for (i = lastNamelndex +1; (i <= 
lastSuffixIndex) && (nameSegmentTaqListlndex < NH_MAX_TAQS PER SEGMENT); 
i++) { ■ 

gnSegments [ lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] . segString = gnSegments [ i J . segStrihg; 

gnSegments [lastNamelndex] . taqL 
ist [nameSegmentTaqListlndex] .taqAction =» tempTAQList ( tempTAQSeglndex] - 
>gnAction; 

gnSegments [ lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] . taqType = tempTAQList ( tempTAQSeglndex] - 
>taqType; 

tempTAQSegIndex++; 
nameSegmentTaqListIndex++; 
gnSegments [lastNamelndex] .numT 

AQs +=1; , 

) 

// associate everything at or 
past the previous prefix (s) with the name 

// ' segment we just fojund. Again, 

since there .may not have been any 

// prefixes, we might not even 



enter* this for loop 

if (lastPrefixIndex != -1) { - ■ 
for (i = lastPrefixIndex; (i < 
seglndex) && (tempTAQSeglndex < NH_MAX_TAQS_PER_SEGMENT) ; i++) { 

gnSegments [seglndex] .taq 
List[i - lastPrefixIndex] .segString = gnSegments [ij • segString; 

gnSegments (seglndex] .taq 
List[i - lastPrefixIndex] .taqAction = tempTAQList [tempTAQSeglndex] - 
>gnAction; 

gnSegments [seglndex] .taq 
List[i - lastPrefixIndex] .taqType tempTAQList [tempTAQSeglndex] - 
>tagType; 

^ tempTAQSegIndex++; 

gnSegments [seglndex] .num 

TAQs 1; 

} 



// 

// 



now move all the segments back 
ending with -the last segment . 



gnSegments [i - numTempTAQSegs] 



starting with this segment and 

We move them back to the first 

// segment after the previous 
Name segment, which is numTempTAQSegs places 

for (i = seglndex; i < 

numGnSegments ; i++ ) { 
= gnSegments [i] ; 

numGnSegments; i++) { 
numTempTAQSegs ] ; 

numTempTAQSegs ; / / 



} 



//for (i = lastNamelndex + 1; i < 
// gnSegments [i] = gnSegments [i + 

//} 

numGnSegments 
we not have less segments, since we got 



// 



rid of some TAQs 



numTempTAQSegs ; 
too 

0; 

the temp segment array 



seglndex; 
lastNamelndex 



seglndex -= 

// move our pointer back 



numTempTAQSegs 



// 



clear out 



lastNamelndex = 
// 



mark the new 



} 

else ( 

if ( (tempTAQRecordPtr->taqType == 'P') "I | 
(tempTAQRecordPtr->taqType *T')) { 

// got a prefix or a. title 
tempTAQList [numTempTAQSegs] « 

tempTAQRecordPtr; 

numTempTAQSegs++ ; 

// only set the prefix index if 



we do not have one on record, 
the right most prefix in a string 



// otherwise, we will only get 

// of consecutive prefixes, 
if (lastPrefixIndex -1) 

lastPrefixIndex = seglndex; 



tempTAQRecordPtr; 



else ( 

// must be a suffix or qualifier 
tempTAQList [numTempTAQSegs] = 

numTempTAQSegs++ ; 
lastPrefixIndex « - 



1; // any previous prefixes now considered a suffix 

lastSuf f ixindex - seglndex; 

} 

} 

seglndex++; // look at next 



segment 



} . 

// now we are at the end of all segments, so make 
// TAQs that were trailing get associated with the 



sure that any 
last name segment. 

// do an initial check to make sure we actually got 

one or more TAQs . 

// if not, all we really have to do iis just reflect 

the new value for 

// lastNamelndex. 

if (numTempTAQSegs > 0) { 

// associate all the stored taqs with the 

last name segment. 

// in the loop below: 

// i is the index into the gnSegments 

list for the TAQ string we are copying 

// tempTAQSeg Index is the index into 

the tempTAQList for the saved TAQ info 

// lastNamelndex is the index into the 

gnSegments for the name getting 

// the TAQs associated with it. 

// gnSegmentTaqListlndex is the index 

into the taqList for the name getting • 

// the TAQs associated with it. 

// 

// We must be careful that we do not 
overwrite any TAQs already associated with 

// the name (from prefixes) . For this 
reason, we use separate indexes for the 

// tempTAQList and the gnSegments' taqList. 

nameSegmentTaqListlndex - 
gnSegments [lastNamelndex] .numTAQs; 

tempTAQSeg Index = 0; 

for (i «» lastNamelndex + 1; (i < numGnSegments) 
&& (nameSegmentTaqListlndex < NH_MAX_TAQS_PER_SEGMENT) ; i++) { 

gnSegments [lastNamelndex] .taqList [nameSegm 
entTaqListlndex] . segString - gnSegments [i] .segString; 

gnSegments [lastNamelndex] .taqList [nameSegm 
entTaqListlndex] .taqAction » tempTAQList [tempTAQSeglndex ] ->gnAct ion; 



gnSegments [lastNamelndex j . taqLisc (nameSegm 
entTaqListlndex) .taqType = tempTAQList [ tempTAQSeglndex] ->taqType; 

cempTAQSegIndex++ ; 

nameSegmentTaqListIndex++; 

gnSegments [lastNamelndex] . numTAQs +" 1; 

} 

// now we can just chop off all the TAQ 
segments by reducing numGnSegments . 

numGnSegments -= numTempTAQSegs; 

) 

-NSr--. } 

else { 

// we did not get any Non-TAQ segments. Move all 
the' segments to the TAQ 

// list for the first segment, create a single 
segment, and set its string 

//• value to "". 

gnSegments [0] . numTAQs = 0; // set this in case 
there were no TAQs {empty string) 

// In that case, we would not have 

cleared it out orignally 

for (i « 0; i < numTempTAQSegs; i++) { 
gnSegments [0] .taqListfi] .segString = 

gnSegments [i ) .segString; 

gnSegments [0] . taqList [i) . taqAction = 
tempTAQList [i] ->gnAction; 

gnSegments [0] - taqList [ i] . taqType = 

tempTAQLis.t [i] ->taqType; 

gnSegments (01 . numTAQs += 1; 

} 

numGnSegments = 1; 
• gnSegments [0] . segString = 

gnSegments (0] .status « NH NAME_FIELD_STATUS_UNKNOWN; 

1 

} 

// as a last step, we must make sure that the number of 
gnSegments is 

// now no greater than NH_MAX_SEGS_AFTER_TAQ. We just ignore 
any segments 

// after the max. 

if {numGnSegments > NH_MAX_SEGS_AFTER_TAQ) 

numGnSegments = NH_MAX_SEGS_AFTER_TAQ; 

// clear out the TAQ counts for each segment. 
// This is important because the TAQ segments are not 
initalized 

// if they are not filled in. 
for (i = 0; i < numSnSegments; i++) 
snSegments ( i] , numTAQs « 0; 

// Now do the SN segments 

if (nameParms->getSeparateGnTaqs ( ) == true) { 
// init some variables ' ' 
seglndex = 0; 
numTempTAQSegs = 0; 



// Start out by looking for TAQs at the start of the name 

field, 

// before any name segments. 

// while there- are TAQ values at the start of the sn 
// get their associated TAQ record and place that in 
// a temporary list, 
while (seglndex < numSnSegments ) { 
tempTAQRecordPtr taqTable- 
>getTAQSegment (snSegments [seglndex] . segString, 



primaryCultureCode , 

secondaryCultureCode) ; 

if (tempTAQRecordPtr != NULL) { 

// make sure we are not past our space for 

TAQs in the temp list 

// This would happen if a name field started 

out with tons of TAQs 

if (seglndex < NH_MAX_TAQS_PER_SEGMENT) { 
terapTAQList [numTempTAQSegs) = 

tempTAQRecordPtr; 

numTempTAQSegs++; 

} 

seglndex++; 

} 

else 

break; 

} 

// as long as we found a non-TAQ segment 
if (seglndex < numSnSegments) { 

// fill up the taqList for the first Name Segment 

with 

// each of the leading TAQs we found. If we found 

no TAQs above, 

// numTempTAQSegs will be 0, so we wont even enter 

into the loop. 

// Also, since we resticted the loop above, we are 

guaranteed to 

// not exceed our space for TAQs for a single 

segment . 

for (i = 0; i < numTempTAQSegs; i++) . { 

snSegments (seglndex] .taqList [i] .segString = 

snSegments [i] .segString; 

snSegments [seglndex] . taqList [i) .taqAction - 
tempTAQList [i] ->snAction; 

snSegments [seglndex] . taqList [i] .taqType = 

tempTAQList ( i ] -> taqType ; 

snSegments [seglndex] .numTAQs +« 1; 
// now move all the segments back starting with 

first name segment 

// ousting the leading TAQs. If we found that the 

first segment 

// was a name segment, we do not need to move 

anything. 

if (seglndex != 0) { 

for (i « seglndex; i < numSnSegments; 



i++) { 



} 

// 



we removed some segments 



snSegments[i - seglndex] = snSegmencs [ i )*; 
note that we now have less segments, since 



// that were TAQ values 
numSnSegments -= seglndex; 



now back at the begining 



also, set the seglndex to 0, since we are 



} 



seglndex 



J I now start looking at the remaining segments 
// along the way, we must keep track of 
// - the index of the last Name segment 

we found (start out as 0, since we backed it up to 0) 

// - the index of the last "suffix-like" 

(starts out as -1, since all TAQs were tacked onto seg 



TAQ we found 

0) 

TAQ we found 
0) 



// - the index of the last "prefix-like" 

(starts out as -1, since all TAQs were tacked onto seg 



// 
// 
// 
// 



If we get a: 
Name: 



lastNaraelndex + 1 and the 

// 

snSegment [lastNamelndex] ; 

// 

the lastPref ixindex and 
// 

segment. 

// 

the TAQ values from the snSegment array 
// 

(lastNamelndex = seglndex;) 

// 

many TAQs we ousted 

// 
// 

1 // 
// 
// 
// 

seglndex 

// 
// 



associate everything between the 
lastSuff ixindex with 
associate everything between 
seglndex - 1 with this name . 
move everything back to .oust 
mark the new lastNamelndex 
adjust numSnSegments for how 



"Suffix Like" 

lastPref ixindex = - 
previous prefix now considered a suffix 
lastSuf fixindex = seglndex 
"Prefix Like" 

lastPref ixindex - 



End of Segments 

- associate everything between the 
lastNamelndex -f 1 and seglndex 

// with snSegment [lastNamelndex] ; 

// - adjust numSnSegments for how 

many TAQs we had at end 
// 

// Note that we do not do any storing of anything 
until we either reach the 

// end of the sements, or get a non-taq segment. 

// 

// Also, as we read TAQ segments, we store a 
pointer to their retrieved 

// structure in a list. We do this because we must 



read ahead before 

// we can store a TAQs relevant info (type, action) 
as being associated 

// with a segment, and we do not want to have to 
look up the TAQ info twice. 

numTempTAQSegs = 0; 
lastPref ixindex = -1; 
lastSuf fixindex = -1; 
lastNamelndex « seglndex; 

seglndex++; // look at the next segment 

while (seglndex < numSnSegments) { 
tempTAQRecordPtr = taqTable- 
>geCTAQSegment (snSegments (seglndex] . segString, 

primaryCultureCode, 

secondaryCultureCode) ; 

if (tempTAQRecordPtr == NULL) { 

// segment is not a TAQ value 

// do an initial check to make sure we 

actually got one or more TAQs. 

// if not, all we really have to do is 

just reflect the new value for 

// lastNamelndex. 

if (numTempTAQSegs > 0) { 

// so associate all taqs between 

the previous Name segment and 

// the last suffix with the 
previous Name Segment. Since lastSuf fixindex 

// may be -1 (if there we not 
suffixes), we may not even enter this for loop. 

// this variable is necessary 

because the segment at lastNamelndex 

// might already have TAQs stored 

in its taqList (due to prefixes) . 

// We must keep track of where 
the aext available place in that list is. 

nameSegmentTaqListlndex - 

snSegments [lastNamelndex] .numTAQs; 

tempTAQSeg Index = 0;. 

for (i = lastNamelndex + 1; (i <= 
lastSuf fixindex) && (nameSegmentTaqListlndex < NH_MAX_TAQS_PER_SEGMENT) ; 
i++) { 

snSegments [lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] .segString - snSegments [i] . segString; 

snSegments [lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] .taqAction » tempTAQList [tempTAQSeglndex] - 
>snAction; 

snSegments [lastNamelndex] .taqL 
ist [nameSegmentTaqListlndex] . taqType * tempTAQList [tempTAQSeglndex] - 
>taqType; 

tempTAQSegIndex++ ; 
nameSegmentTaqListIndex++; 
' snSegments [lastNamelndex] .numT 

AQs += 1; 



} 



// associate everything at or " ' 
past the previous prefix (s) with the name 

// segment we just found. Again, 

since there may not have been any 

// prefixes, we might not even 

enter this for loop 

if (lastPref ixindex != -1) { 

for (i = last Pref ixindex; (i < 
seglndex) && (tempTAQSeglndex < NH_MAX_TAQS_PER_SEGMENT) ; i++) { 

snSegments[segIndex] .taq 
Listji - lastPref ixindex] . segString = snSegments [ i ] . segString; 

snSegments [seglndex) .taq 
List[i - lastPref ixindex] .taqAction = tempTAQList ( tempTAQSeglndex} - 
>snAction; 

snSegments [seglndex] .taq 
List[i - lastPref ixindex] .taqType - tempTAQList [tempTAQSeglndex] - 
>taqType; 

tempTAQSegIndex++; 
snSegments (seglndex] .num 



TAQs += 1; 

starting with this segment and 
We move them baclc to the first 



} 

// 
// 

// 



) 

now move all the segments back 
ending with the last segment, 
segment after the previous 



Name segment, which is numTerapTAQSegs places 

for (i = seglndex; i < 



numSnSegments; i++) 
= snSegments [i] ; 



snSegments [i - numTempTAQSegs J 



numTempTAQSegs ; 



// 



numSnSegments -= 
we not have less segments, since we got 



rid of some TAQs 



numTempTAQSegs ; 
too 

0; 

the temp segment array 



seglndex; 
lastNamelndex 



seglndex -= 

.. .. // move our pointer baclc 

numTempTAQSegs 



// 



clear out 



lastNamelndex - 
// 



mark the new 



( tempTAQRecordPtr->taqType 
tempTAQRecordPtr; 



) 

else { 

if ( (tempTAQRecordPtr->taqType == 'P') li 
•TM) { 

// got a prefix or a title 
tempTAQList [ numTempTAQSegs ) " 



numTempTAQSegs ++ ; 



we do not have one on record, 
the right most prefix in a string 



else 



// only set the prefix index if 

// otherwise, we will only get 

// of consecutive prefixes, 
if (lastPref ixindex -1) 

lastPrefixIndex = seglndex; 



// must be a suffix or qualifier 
tempTAQList [numTempTAQSegs] - 



t empTAQRecordPt r ; 
1; "s-ii-.*.. 

segment 

sure that any 
last name segment 

one orr more TAQs . 
the new value for 



) 

// 
// 

// 
// 



numT empTAQS eg s + + ; 
lastPrefixIndex = - 
any previous prefixes now considered a suffix 
lastSuff ixindex = seglndex; 



} 

seglndex++; 



// 



look at next 



now we are at the end of all segments, so make 
TAQs that were trailing get associated with the 

do an initial check to make sure we actually got 
if not, all we really have to do is just reflect 



// lastNamelndex. 

if (numTempTAQSegs > 0) { 

// associate all the stored taqs with the 

last name segment. 

// in the loop. below: 

// i is the index into the snSegments 

list for the TAQ string we are copying 

// tempTAQSeglndex is the index into 

the tempTAQList for the saved TAQ info 

// lastNamelndex is the index into the 

snSegments for the name getting 

// the TAQs associated with it. 

// snSegmentTaqListlndex is the index 

into the taqList for the name getting 

// the TAQs associated with it. 

//. 

// We must be careful that we do not 
overwrite any TAQs already associated with 

// the name (from prefixes) . For this 
reason, we use separate indexes for the 

// tempTAQList and the snSegments' taqList. 

nameSegmentTaqListlndex = 
snSegments [lastNamelndex] .numTAQs; 

tempTAQS eg Index = 0; 

for (i - lastNamelndex +1; (i < numSnSegments) 
&& (nameSegmentTaqListlndex < NH_MAX_TAQS_PER_SEGMENT) ; i++) { 

snSegments [lastNamelndex] . taqList (nameSegm 
entTaqListlndex] .segString = snSegments [i] . segString; 

snSegments [lastNamelndex] . taqList [naraeSegm 



entTaqList Index] .taqAction = tempTAQList [ tempTAQSeglndex] ->snAction; 

snSegments ( lastNamelndex] . taqList {nameSegm 
entTaqListlndex] .taqType = tempTAQList [tempTAQSeglndex] ->taqType; 

tempTAQSegIndex++ ; 

nameSegmentTaqListIndex++; 

snSegments [lastNamelndex] .numTAQs +» 1; 

1 

// now we can just chop off all the TAQ 
segments by reducing numSnSegments . 

numSnSegments -= numTempTAQSegs; 

) 

else { 

// we did not get any Non-TAQ segments. Move all 
the segfments to the TAQ 

// list for the first segment, create a single 
segment, and set its string 

// value to "". 

snSegments [0] .numTAQs = 0; // set this in case 
there were no TAQs (empty string) 

// In that case, we would not have 

cleared it out orignally 

for (i = 0; i <• numTempTAQSegs; i++) { 
snSegments [0] .taqList [i] .segString = 

snSegments [i] .segString; 

snSegments [0] .taqList (i) .taqAction « 
tempTAQList ti] ->snAct ion; 

snSegments [0] .taqList [i] .taqType = 

tempTAQList [i]'>taqType; 

snSegments [0] .numTAQs += 1; 

) 

numSnSegments = 1; 
snSegments [0] .segString = ""; 

snSegments 10] .status = NH__NAME_FIELD_STATUS_UNKNOWN; 

) 

} 

// as a last step, we must make sure that the number of 
gnSegraents is 

// now no greater than NH_MAX_SEGS_AFTER_TAQ. We just ignorfe 
any segments 

// after the max. 

if (numSnSegments > NH^MAX SEGS_AFTER_TAQ) 

numSnSegments = NH_MAX_SEGS_AFTER_TAQ; 

} 



// function to generate index keys for this name. 

// Each key includes a portion for the GN and a portion 

// ' for the SN. 

// We currently support two key lengths, 32 bits or 64 bits. 

// The GN length does not have to be, the same as the SN length, 

// but GN keys generated must be the same length (similarly for 

// SN) . Thus the full key length could be: 
7/ 

// 64: Both GN and SN are 32 bits 



// 96: Gn is 64, but SN is 32 

// 96: Gn is 32, but SN is 64 

// ■"128: BothJ3N and SN are 64 bits 

// 

// Keys are generated by name stem segment. The first key 
// consists of a key for the first GN segment, and a key 
// for the first SN segment. The second key 
// consists of a key for the second GN segment, and a key 
// for the second SN segment. When there are a differing number 
// -of GN and SN segments, the final segment of the name 
// field with the fewer number of segments is repeated. 
JJ^, Thus, the number of keys generated is given by the formula: 
Z*^ ' max (numGnSegs, numSnSegs) 

// 

7/ We do things this way so that a name has the same number of keys 
// for both GN and SN, and in fact we can view the two keys as one 
// contiguous key that can be passed to comparison functions as a 
// single value. 
// 

// Note that we are talking about stem segments (TAQ segments have 

// been removed) . 
// 

// maxKeys specifies how many keys the caller can fit into 

// keyBuff. It is up to the caller to make sure that they have 

allocated 

// enough space in the keyBuff to hold maxKeys. 

unsigned char NHNameData : : genlndexKeys (int maxKeys, NHKeyWidth 
gnKeyWidth, 

NHKeyWidth snKeyWidth, void *keyBuff) 

{ 

int numKeysGenerated = 0; 
int gnSeg Index = 0; 
int snSeglndex = 0; 

unsigned int *keyPtr = (unsigned int *) keyBuff; 



while (numKeysGenerated < maxKeys) { 

if {(gnSeglndex >= numGnSegments) && (snSeglndex >= 
numSriSegments) ) 

break; 

else { . . ' 

numKeysGenerated++ ; 

// make sure that if one segment is now at the end, 
// we stay on the last segment 
if (gnSeglndex == numGnSegments) 

gnSeglndex — ; 
if (snSeglndex numSnSegments) 

snSeglndex — ; 

if (gnKeyWidth == NH_KEY_WIDTH_32 ) { 
// gn key length is 32 
*keyPtr = 

globalDigraphBitmapArray. get 32BitKeyForToken (gnSegments [gnSeglndex] .segS 
tring) ; 

keyPtr++; ^ . // move the pointer by 4 

bytes 

} 

else { 

// gn key length is 64 



globalDigraphBitmapArray.get64BitKeyForToken (gnS 
egments [gnSeglndex] .segString, 

(bit_64_t *)keyPtr); 

keyPtr +« 2; //. move the pointer 

by 8 bytes 

) 

if (snKeyWidth NH_KEY_WIDTH_32 ) { 
// gn key length is 32 
*keyPtr = 

globalDigraphBitmapArray . get32BitKeyForToken ( snSegments [ snSeglndex] . segS 
tring) ; 

keyPtr++; // move the pointer by 4 

bytes 

) 

else { 

// gn key length is 64 

globalDigraphBitmapArray , get64BitKeyForToken (snS 
egments [snSeglndex] .segString, 

(bit_64_t *)keyPtr); 

keyPtr +=2; // move the pointer 

by 8 bytes 

} 

// advance the segment indexes 

snSegIndex++; 

gnSegIndex++; 

} 

} 

return numKeysGenerated; 

} 



// File: NHEvalNameData . cdd 
// 

// Description: 

// 

// Implementation to the NHEvalNameData class. 

// 

// 

// History: 
// 

// 5/14/97 EFB Created 

// 9/1/97 EFB Lots of changes to support 

retaining segment scores in 

// best mode so 

'^t^at sorting can be more detailed and accurate 

// 10/31/97 EFB Made several member functions 

.protected, and made perf ormComp ( ) 

// a friend of 

NHQueryNameData . Also changed performComp to 

// NOT delete 

objects that are not passed on to the resultslist, 

// to . . 

accomodate the new method of deleting NHEvalNameData objects. 
// 11/03/97 EFB Added a new function, 

calcNameScore ( } and made it virtual. 

/ / removed 
virtual from performComp. The perform comp method 

// * was too 

complicated to be subclassed. We really only want 

// , callers to 

be able to affect the name score and the determination 

// of 

HIT/NO_HIT. These are now the only virtual functions. Both 

/ / are now 

inline in the header file so the caller knows exactly 

// what, is 

happening in these functions if they decide to subclass 

// and 

override. OOPS, I forgot compareScore ( } , which is also 

// virtual - we 

want them to be able to change how hits are sorted. 
// 

// 3/02/98 EFB Made lots of changes necessary 

when I moved a bunch of 

// 'parameters 
(the ones associated with parsing the name) 

// from the 

NHCompParms class into a new class called NHNameParms. 
// and renamed 

the NHCompParms class to NHCompParms. 

// 3/20/98 EFB Changed names to NH from SN 



#include <string.h> 
#include <stdio.h> 
#include <stdlib.h> 



#include "NHEvalNameData . hpp" 

#include "NHQueryNameData. hpp" 

# include "NH_util . hpp" 

# include "NH_queens_arrays . hpp" 



#include "NHVariantTable . hpp" 

#include "NHResultsList . hpp" 

#include "NHTAQTable . hpp" 

tinclude "NHNameParms . hpp" 



// private, non-member function prototype 

static double NH_digraph_score (char *qSeg, int qSegLen, 

char *evalSeg, int evalSegLen, 
-'^si^., bool useLeftDigraphBias) ; 

static double NH_best_score ( int numQSegs, int numEvalSegs, 

NHSegScoreMode scoreMode, 



scores [NH__MAX_SEGS_AFTER_TAQ] [NH_MAX_SEGS_AFTER_TAQ] ) ; 

void NH_best_score__for_highest_mode (int xDim, int yDim, double 
highestScore, 

*bestSegScores, 

scores [NH_MAX_SEGS_AFTER_TAQ] [NH__MAX_SEGS_AFTER_TAQ] } ; 



static double NH_calc_score ( 

t evalSegs, int numEvalSegs, 
tVariants querySegmentVariants, 



Farms *comp Farms, 
Farms * name Farms', 
Fields nameField, 
*origQNameField, 
*origEvalNameField, 
*numSegsScored, 
■ *bestSegScores) ; 



SegList qSegs, int numQSegs, 



*primaryCulture, 
* seconder yCulture, 



doable 

double 
double 

SegLis 

SegLis 

char 

char 

NHComp 

NHName 

NHName 

char 

char 

int . 

double 



static void NH_apply_TAQs_to_score (double *diScore, Segment *qSeg, 

Segment *evalSeg, 

double absDelTAQFactor, 
double absDisTAQFactor, 
double delTAQFactor, 
double disTAQFactor) ; 
static bool NH_check_compressed_name (char *qSegString, char 



*evalSegString, 



char *compressCharsPartl, 
char *compressCharsPart2) ; 

NHEvalNameData: :NHEvalNameData (NHNameParms *nParms, char *aGn, char 

*aSn) : 

NHNameData (nParms, aGn, 

aSn) 
{ 

resetScores { ) ; 

} . 

NHEvalNameData: : NHEvalNameData (NHNameParms *nParms, char *aGn, char 
*aSn, char *aMn) : 

NHNameData (nParms, aGn, 

aSn, aMn) 
{ 

resetScores { } ; 

} 

NHEvalNameData: : NHEvalNameData (NHNameParms *nParms, char *name, 
NHNameFormat nameFormat) : 

NHNameData (nParms, name 

nameFormat ) 
{ 

resetScores {) ; . 

) 

// constuct an object from an archived representation in 
// a stream. 

// 

// The archive is in the following order 
// 

// gnLen 
// snLen 
// nameStorage 

NHEvalNameData: : NHEvalNameData (NHNameParms *nParms, istream iinStream) 

NHNameData (nParms, 

inStream) 
{ 

// read the gn, sn and name scores 
.if (inStream) 

inStream. read ( (char *)&gnScore, sizeof (gnScore) ) ; 
if (inStream) 

inStream. read ( (char *)&snScore, sizeof (snScore) ) ; 
if (inStream) 

inStream. read ( (char * ) SnameScore, sizeof (nameScore) ) ; 

// seg differentials 

if (inStream) 

inStream. read ( (char *)&gnSegDifferential, 
sizeof (gnSegDifferential) ) ; 
if (inStream) 

inStream. read ( (char *)&snSegDifferential, 
sizeof (snSegDif ferential) ) / 



// read the number of gn segs scored, and however many scores 
we need inStream. read ( (char * ) SnumGnSegsScored, 
sizeof (numGnSegsScored) ) ; 
if (inStream) 

inStream. read { (char * ) &numGnSegsScored, 
sizeof (numGnSegsScored) ) ; 
if (inStream) { 

if (numGnSegsScored > 0) { 

inStream. read ( (char *)gnSegScores, numGnSegsScored * 

sizeof (double) ) ; 
} 

// read the number of sn segs scored, and however many scores 
we neBd 

if (inStream) 

inStream. read ( (char * ) &numSnSegsScored, 
sizeof (numSnSegsScored) ) ; 
if (inStream) { 

if (numSnSegsScored > 0) { 

inStream. read( (char * ) snSegScores , numSnSegsScored * 

sizeof (double) ) ; 

) 

} 

) 



NHEval-NameData : : -NHEvalNameData ( ) 

{ 

) 



bool NHEvalNameData : : archiveData (ostream &outStream) 
{ 

bool rc = true; 



rc » NHNameData: : archiveData (outStream) ; 
if (rc) { 

// read the gn, sn and name scores 

outStream. write { (char *)&gnScore, sizeof (gnScore) ) ; 

outStream. write ( (char *)&snScore, sizeof ( snScore) ) ; 

outStream. write ( (char *) finameScore, sizeof (riameScore) ) ; 

// seg differentials 

outStream. write ( (char * ) &gnSegDif ferential, 

sizeof (gnSegDif ferential) ) ; 

outStream. write ( (char *) isnSegDif ferential, 
sizeof (snSegDifferentiai) ) ; 

// read the number of gn segs scored, and however many 
scores we need inStream. read ( (char * ) ^numGnSegsScored, 
sizeof (numGnSegsScored) ) ; 

outStream. write ( (char *) &numGnSegsScored, 
sizeof (numGnSegsScored) ) ; 

if (numGnSegsScored > 0) / ( 

outStream. write ( (char * ) gnSegScores, numGnSegsScored 

sizeof (double) ) ; 

} 



// read the number of sn segs scored, and however many 
scores we need 

outStream. writ&-(-tchar *) &nuinSnSegsScored, 
sizeof (numSnSegsScored) ) ; 

if (numSnSegsScored > 0) { 

outStream. write ( (char *) snSegScores, numSnSegsScored * 
sizeof (double) ) ; _ 

} 

} 

return rc; 

// note that this function is a friend of NHQueryNameDa ta, which is 
// why we are able to access private member functions of that class, 
void inline NHEvalNameData : : calcComponentScores (NHQueryNameData 
*queryName) 
{ 

char *primaryCulture « nameParms- 

>primaryCultureCode ; 

char *secondaryCulture « nameParms- 

>secondaryCultureCode ; 

// do the digraph compare and set the scores 
gnScore = NH_calc_score (queryName->gnSegments, queryName- 
>numGnSegments , 

ents, numGnSegments, 

ame->gnSegment Variants , 

yCulture, secondaryCulture, 

rms , 

rms, 

ST^NAME, 
ame->gn, gn, 
SegsScored, 
cores) ; 

snScore = NH_calc_score (queryName->snSegments, queryName 
' >numSnSegments, 

ents, numSnSegments, 

ame->snSegmentVariants , 

yCulture, secondaryCulture, 

rms, 

rms , ' 

T_NAME, 
ame->sn, sn, 



gnSegm 
quer^N 
primar 
compPa 
name Pa 
NH_FIR 
queryN 
.&numGn 
gnSegS 

snSegm 
queryN 
primar 
compPa 
name Pa 
NH_LAS 
queryN 



&numSn 

SegsSpored, 

snSegS 

cores ) ; - • ■ 

} 



// note that this function is a friend of NHQueryNameDa ta , which is 
// why we are able to access private member functions of that class. 
NHReturnCode NHEvalNameData : : perf ormComp (NHQueryNameData 

*queryName, 

•'^s.-.v, NHCompParms 
* someCompParms ) 

{ 

NHReturnCode compResult; 
NHResultsList *resultList; 



// save the compParms so that they can be easily referenced 
// throughout the comparison process. 
corapParms = someCompParms; 

calcComponentScores (queryName) ; 

// call a method to calculate the name score. 
calcNameScore ( ) ; 

// store the segments differentials, in case we get a tie 

score. 

gnSegDif ferential * abs (numGnSegments - queryName- 
>getNumGnSegments { ) ) ; 

snSegDif ferential = abs (numSnSegments - queryName- 
>getNumSnSegments ( ) ) ; 

// Now call the getCompResult ( ) function to get the return 

value 

// {i.e. was it a match?) 
compResult = getCompResult ( ) ; 

// now see if we are working with a results list 
resultList « queryName->getResultsList ( ) ; 
if (resultList != NULL) { 

// we are using a result list. If this is a hit, add it 

// to the result list. 

// Otherwise, , delete it 

if (compResult == NH_MATCH) { 

NHReturnCode " tempInsertResult ; 



with 

returned 
object , 



// make sure the insert works. If so, don't mess 

// the compResult, so the comparison will be 

// as a hit. If there was an error, delete this 

// and save the error code so it can be returned. 
tempInsertResult « resultList->addHit (this) ; 
if (tempInsertResult *«« NH_SUCCESS) { 
compResult = tempInsertResult; 

1 



) 

return compResult; 

} 



// used only when the segment mode is set to HIGHEST. 

// It compares the segment scores the were retained when 

// the name was compared to the query name. 

// We are comparing the segment scores for two (pre-scored) 

// eval names. The comparison should find which name has 

// the "best" set of segment scores, where best is defined 

// as "the one with the highest best score". If the best 

/ p^"^"'-- score results in a tie, we move on to the second best score, 

// and so on until we find a difference, or there are no more 

//. segments to compare. Each name has variables numGnSegsScored 

// and numSnSegsScored, that tell how many segments were scored 

// in the name. We do up to N comparisons, where N is the larger 

// of the number of segments scored in each name. Where one name 

// has less segments scored than the other, a default value of 

// NH_DEFAULT_MISSING_SEGMENT_SCORE is assigned. This is so that 

// a scored segment has to beat some threshold to be considered 

// better than nothing at all. 

double NHEvalNameData : : compareSegment Scores (NHEvalNameData 

*scoredName, NHNameFields nameField) 

I 

double scoreDiff; 

int maxComparisons ; 

double *thisEvalScores; 

double *compEvalScores; 

int numSegsScoredForThisEval; 

int numSegsScoredForCompEval; 



if (nameField == NH_LAST_NAME) { 
thisEvalScores = snSegScores; 
compEvalScores = scoredName->snSegScores ; 
numSegsScoredForThisEval = numSnSegsScored; 
. numSegsScoredForCompEval = scoredName->numSnSegsScored; 

) 

else { 

thisEvalScores » gnSegScores; 
compEvalScores - scoredName->gnSegScores ; 
numSegsScoredForThisEval = numGnSegsScored; 
numSegsScoredForCompEval « scoredName->numGnSegsScored; 

} 

maxComparisons = numSegsScoredForThisEval > 
numSegsScoredForCompEval ? numSegsScoredForThisEval : 
numSegsScoredForCompEval ; 

for (int i 0; i < maxComparisons; i++) { 
if (i >- numSegsScoredForThisEval} 

thisEvalScores [i] = NH_DEFAULT_MISSING_SEGMENT_SCORE; 
else // we can do an else because only one segment 

can be missing, not both 

if (i >= numSegsScoredForCompEval) 
compEvalScores [i] = 
NH_DE FAULT^MI S S I NG__S EGMENT^SCORE ; 

scoreDiff « compEvalScores [ i ] - thisEvalScores [ i ) ; 
if (scoreDiff !« 0) 



break; 



} 



return scoreDiff; 



********************** 



/* NH calc_score 

Performs a string comparison on two name tieias. 
Returns a value between 0.00 and 
^"^^^0, with 1.00 being an exact-fit 

double NH_calc__score { SegList qSegs, int numQSegs, 
t evalSegs, int numEvalSegs/ 
tVariants querySegmentVariants, 

*primaryCulture, 



Farms *compParms, 

Farms *nameParms, 

Fields nameField, 

♦origQNameField, 

*origEvalNameField, 

*numSegsScored, 

♦bestSegScores) 



{ 



*secondaryCulture, 



SegLis 

SegLis 

char 

char 

NHComp 

NHName 

NHName 

char 

char 

int 

double 



NHAnchorSegMode 
NHSegScoreMode 
double 
double 

double 

double 

double 

bool 

double 

double 



bool 

// double 
bool 
double 
double 
double 
double 

TAQ] tNH_MAX_SEGS_AFTER_TAQ] ; // 
int 



anchorSeg; 
scoreMode; 



temp index for query segments 



oopsFactor ; 
absDelTAQFactor;. 
absDisTAQFactor; 
delTAQFactor; 
disTAQFactor; 
matchlnit; 
initScore; 

initialOnlnitialMatchSco 

checkVariant ; 

variantScore; 
leftDigraphBias; 
anchorFactor; 
nameUnknownScore ; 
noNameScore; 
scoresTable [NH_MAX_SEGS_AFTER_ 
scores for segment pairs 
qlndex; 



.i^t * evallndex; // 

temp index for eval segments 

qSegLen; 

// hold string length of query segment 

evalSegLen; // 

hold string length of eval segment 

double diScore = 

0-0; // temp score for single pair comoarison 

double • hiScore = 

0-0'* . // temp score to hold best score as we iterate, 

// which lets us avoid 

.^best_score in mode=BEST 

■ t>ool areVariants; 
// temp flag to hold if the pair are variants 

returnValue » 0.0; 
NHVariantTable * variantTable; 

ciouble varScore; 
NHVarld evalSegVarld 

bool scoreTaqs; 

double compressedNameScore; 

checkCompressedName; 



// set some paramters based on the name field 
if (nameField == NH_LAST_NAM£) { 

anchorSeg - compParms->getSnAnchorSegmentMode ( } ; 
scoreMode = compParms->getSnSegmentScoreMode ( ) ; 
oopsFactor = compParms->getSnOOPSFactor () ; 
matchlnit = compParms->getMatchSnIntial ( ) ; 
initScore = compParms->getSnInit ialScore ( ) ; 

initialOnlnitialMatchScore = compParms- 
>getSnInitialOnInitialMatchScore { ) ; 

checkVariant = compParms->getUseSnVariants ( ) ; 

anchorFactor = compParms->getSnAnchorFactor ( ) ; 

leftDigraphBias = compParms->getUseSnLeftBiasO ; 

nameUnknownScore = compParms->getLNUScore ( ) ; • 

noNameScore = compParms->getNLNScore ( ) ; 

scoreTaqs compParms->getScoreSnTAQs ( ) ; 

absDelTAQFactor = compParms->get AbsDelSnTAQFactor ( ) ; 

absDisTAQFactor = compParms->getAbsDisSnTAQFactor () ; 

delTAQFactor = compParms->get DelSnTAQFactor { ) ; - 

disTAQFactor = compParms->g€tDisSnTAQFactor ( ) ; 

compressedNameScore = compParms->getSnCom"pressedNameScore { ) ; 

CheckCompressedName = compParms->getCheckSnCompressedName ( ) ; 

variantTable = nameParms->snVariantTable; 

} 

else { 

anchorSeg = compParms->getGnAnchorSegmentMode ( ) ; 
scoreMode = compParms->getGnSegmentScoreMode { ) ; 
oopsFactor = compParms->getGnOOPSFactor ( ) ; 
matchlnit » compParms->getMatchGnIntial ( ) ; 
initScore « compParms->getGnInitialScore ( ) ; 

initialOnlnitialMatchScore = compParms- 
>getGnInitialOnInitialMatchScore () ; 

CheckVariant = compParms->getUseGnVariants ( ) ; 

anchorFactor = compParms7>getGnAnchorFactor ( ) ; 

leftDigraphBias = compParms->getUseGnLef tBias ( ) ; 

nameUnknownScore «= compParms->get FNUScore ( ) ; 

noNameScore * compParms->getNFNScore ( ) ; 



.^"^ evallndex; // 

temp index for eval segments 

t^^^ qSegLen; 
// hold string length of query segment 

^ ,^ "-^^ evalSegLen; // 

noid string length of eval segment 

double diScore = 

O'O; // temp score for single pair comoarison 

double hiScore = 

O'O'* // temp score to hold best score as we iterate, 

// which lets us avoid 

,^b^st_score in mode=BEST 

/' ^ool areVariants; 
// temp flag to hold if the pair are variants 

.^o^^^e returnValue « 0.0; 

NHVariantTable *variantTable; 

varScore; 

NHVarld evalSegVarld 

^ool scoreTaqs; 

double compressedNameScore; 

checkCompressedName; 



// set some paramters based on the name field 
if (nameField == NH_LAST_NAME) { 

anchorSeg - compParms->getSnAnchofSegmentMode ( ) ; 
scoreMode = compParms->getSnSegmentScoreMode { ) ; 
oopsFactor = compParms->getSnOOPSFactor ( ) ; 
matchlnit = compParms-->getMatchSnIntial { ) ; 
initScore = compParms->getSnInitialScore ( ) ; 

initialOnlnitialMatchScore compParms- 
>getSnInitialOnInitialMatchScore ( ) ; 

checkVariant compParms->getUseSnVariants ( ) ; 

anchorFactor - compParms->getSnAnchorFactor ( ) ; 

leftDigraphBias = compParms->getUseSnLef tBias { ) ; 

nameUnknownScore = compParms->getLNUScore ( ) ; 

noNameScore - compParms->getNLNScore { ) ; 

scoreTaqs = compParms->getScoreSnTAQs { ) ; 

absDelTAQFactor = compParms->getAbsDelSnTAQFactor { ) ; 

absDisTAQFactor = compParms->getAbsDisSnTAQFactor ( ) ; 

delTAQFactor = compParms->getDelSnTAQFactor { ) ; - 

disTAQFactor = compParms~>getDisSnTAQFactor ( ) ; 

compressedNameScore = compParms->getSnCompressedNameScore ( ) ; 

CheckCompressedName « compParms->getCheckSriCompressedName ( ) ; 

variantTable » nameParms->snVariantTable; 

} 

else { 

anchorSeg = compParms->getGnAnchorSegmentMode ( ) ; 
scoreMode = compParms->getGnSegmentScoreMode ( ) ; 
oopsFactor = compParms->getGnOOPSFactor ( ) ; 
matchlnit = compParms->getMatchGnIntial ( ) ; 
initScore = compParms->getGnInitialScore ( ) ; 

initialOnlnitialMatchScore « compParms- 
>getGnInitialOnInitialMatchScore ( ) ; 

checkVariant = compParms->getUseGnVariants { ) ; 

anchorFactor = compParms7>getGnAnchorFactor ( ) ; 

leftDigraphBias = compParms->getUseGnLef tBias () ; 

nameUnknownScore «= compParms->getFNUScore { ) ; 

noNameScore « compParms->getNFNScore () ; 



evallndex; // 

temp index for eval segments 

qSegLen; 

/ / hold string length of query segment 

^ , evalSegLen; // 

hold string length of eval segment 

double diScore = 

0.0; // temp score for single pair comparison 

double hiScore - 

0.0; // temp score to hold best score as we iterate, 

// which lets us avoid 

.^est_score in mode=BEST 

■• bool areVariants; 
// temp flag to hold if the pair are variants 

f^ouble returnValue « 0.0; 

NHVariantTable *variantTable; 
<^ouble varScore; 
NHVarld evalSegVarld 

^ool scoreTaqs; 

double compressedNameScore; 

checkCompressedName; 

// set some paramters based on the name field 
if (name Field -= NH_LAST_NAME) { 

anchorSeg « compParms->getSnAnchorSegmentMode ( ) ; 
scoreMode = compParms->getSnSegmentScoreMode ( ) ; 
oopsFactor = compParms->getSnOOPSFactor ( ) ; 
matchlnit = compParms->getMatchSnIntiar ( ) ; 
ini'tScore = compParms->getSnInitialScore ( ) ; 

initialOnlnitialMatchScore = compParms- 
>getSnInitialOnInitialMatchScore { ) ; 

checkVariant = compParms->getUseSnVariants ( ) ; 

anchorFactor - compParms->getSnAnchorFactor ( ) ; 

leftDigraphBias - compParms->getUseSnLef tBias { ) ; 

nameUnknownScore = compParms->getLNUScore ( ) ; 

noNameScore = compParms->getNLNScore ( ) ; 

scoreTaqs = compParms->getScoreSnTAQs ( ) ; 

absDelTAQFactor = compParms->getAbsDelSnTAQFactor ( ) ; 

absDisTAQFactor = compParms->getAbsDisSnTAQFactor ( ) ; 

delTAQFactor = compParms->getDelSnTAQFactor ( ) ; - 

disTAQFactor = compParms->getDisSnTAQFactor { ) ; 

compressedNameScore = compParms->getSnCcmpressedNameScore ( } ; 

CheckCompressedName - compParms->getCheckSnCompressedName { ) ; 

variantTable = nameParms->snVariantTable; 

} 

else { 

anchorSeg = compParms->getGnAnchorSegmentMode ( ) ; 
scoreMode = compParms->getGnSegmentScoreMode ( ) ; 
oopsFactor « compParms->getGnOOPSFactor ( ) ; 
matchlnit - compParms->getMatchGnIntial ( ) ; 
initScore = compParras->getGnInitialScor€ { ) ; 

initialOnlnitialMatchScore = compParms- 
>getGnInitialOnInitialMatchScore ( ) ; 

checkVariant = compParms->getUseGnVariants ( ) ; 

anchorFactor = compParms-j>getGnAnchorFactor { } ; 

leftDigraphBias = compParms->getUseGnLef tSias ( ) ; 

nameUnknownScore = compParms->getFN(JScore ( ) ; 

noNameScore = compParms->getNFNScore ( ) ; 



scoreTaqs = compParms->getScoreGnTAQs { ) ; 
absDelTAQFactor = compParms->getAbsDelGnTAQFactor ( ) ; 
absDisTAQFactor = coinpParms->getAbsDisGnTAQFactor ( ) ; 
delTAQFactor = compPanns->getDelGnTAQFactor ( ) ; 
disTAQFactor = compParms->getDis€nTAQFactor ( ) ; 
compressedNameScore - compParms->getGnCompressedNameScore ( ) ; 
checkCompressedName - compParms->getCheckGnCompressedName { ) ; 
variantTable = nameParnis->gnVariantTable; • 



// clear out the scores table 
for (qlndex = 0; qlndex < NH_MAX_SEGS_AFTER_TAQ; ++qlndex) 

for (evallndex = 0; evallndex < NH_MAX_SEGS_ArTER_TAQ; ++evallndex) 
~'*^^scoresTable [qlndex] [evallndex] = 0.0; 

// now go through each possible combination of segment pairs 
// (created by matching a query segment ' against an eval 
segment) . 

// Store the scores in the scoresTable. 
for (qlndex = 0; qlndex < numQSegs; ++qlndex) { 

qSegLen = strlen (qSegs [qlndex] - segString ) ; 

for (evallndex « 0; evallndex < numEvalSegs; ++eval Index) { 
evalSegLen - strlen (evalSegs [evallndex) . segString) ; 

// first check for either the query or eval segment 



being 

scores 
Known - K, 



// blank. 

if ((qSegLen ==0) II (evalSegLen ==0)) { 

// We make a distinction between "unknown" 
and "none". The table below shows the 



unknownScore 



// 

// 

// 
// 
// 



// 
// 
// 



we assign for the various combinations of 
Unknown - U, and None -N. 
I K 



N/A 



NoneScore 



e + 1) / 2 I 



// U I 
(unknownScore + 1) / 2 
// 



unknownScore I 



(unknownScor 



wnScoxe +1) / 2 j 



// N I 
(NoneScore + 1) / 2 
// 



NoneScore 



(unkno 



if (qSegs [qlndex] . status 
NH_NAME_FIELD_STATUS_KNOWN) { ' 

" ~ . //we should not need to check for both 

being known 



if (evalSegs [evallndexl . status == 

NH_NA^3E_FIELD_STATUS_UNKN0W^^ ) 

diScore = nameUnknownScore; 

else // must be 

NH_NAME_FIELD_STATUS_NON_EXISTANT 

diScore = noNameScore; 

} 

else if (qSegs fqindex] . status == 
NH_NAME_FIELD_STATUS_UNKNOWN) { 

if (evalSegs [evallndex] . status == 

NH_NAME__FI ELD_STATUS_KNOWN ) 

diScore = nameUnknovi/nScore; 
else if (evalSegs f evallndex] . status == 

NH_NAME_FIELD_STATUS_UNKNOWN) . 

diScore = (nameUnknownScore + 1.0) / 

2.0; 

else // must be 
NH_NAME_FIELD_STATUS_NON_EXISTANT, same score as 

// above, but we 

repeat it in case we cange behavior later 

diScore = {nameUnknownScore + 1.0) / 

2.0; 

} 

else { // query must be 

NH_NAME_FIELD_STATUS_NON_EXISTANT )• 

if (evalSegs [evallndex) . status =- 

NH_N AME_FI ELD_ST ATUS^KNOWN ) 

diScore = noNameScore; 
else if (evalSegs [evallndex) . status =« 

NH_NAME_FIELD_STATUS_UNKNOWN) 

diScore = (nameUnknownScore + 1.0) / 

2.0; ■ 

else // must be 
NH_NAME_FIELD_STATUS_NON_EXISTANT, same score as 

// above, but we 

repeat it in case we cange behavior later 

diScore = (noNameScore + 1.0) / 2.0; 

} 

) 

else { 

// check the variants if 

// - we are supposed to 

// - we have a list of variants to 

check 

// - there is a variant for this 

segment of the query 

// Note we must check the secondary 

variants if the 

// primary check does not find a 

variant. 

areVariants = false; 

if (checkVariant && (querySegmentVariants != 

NULL) && 

(querySegmentVariants [qlndex] != 

NULL) ) { 

// so see if the eval name segment has 
any variants in the variant table 

evalSegVajrId = variantTable- 
>getVariantIdForName (evalSegs [evallndex] . segString) ; 

if (evalSegVarld != 

NH VAR__NOT_FOUND) { 



// yes, it did have some 
variants, so see if there is an intersection 

varScore = 

querySegmentVariants [qlndex]- — 

>getVariantScoreForIdAndCulture {evalSegVarld, primaryCulture) ; 

if (varScore != 

NH_VARIANTS_NOT_RELATED) { 



} 

else 



rel-ai-ed, so check for the secondary 



areVariants = true; 
diScore = varScore; 



// variants were not 

// variant source 

// Put a check in here to 



see" if the primary culture 

// code was 
NH_CULTURE_CODE_GENERIC. If so, we can skip this check 

// since the secondary code 

is always generic 

if ( strcmp (primaryCulture, 

NH_CULTURE_CODE_GENERIC) ) { 

varScore - • 

querySegmentVariants [qlndex] - 

>getVariantScoreForIdAndCulture (evalSegVarld, secondaryCulture) ; 

if (varScore != 



NH_VARIANTS_NOT_RELATED) 
true; 

varScore; . 



areVariants 
diScore - 



} 



) 



} 



check for intials 
check them? 

evalSegLen 1) ) { 



} 

// now, if we did not find variants above, 
// do we have an initial and are we supposed to 



if (areVariants == false) { 
if (matchlnit && (qSegLen 



1 li 



evalSegsfevallndex] . segString f 0] ) 
matches, we have an initial on inital match, 
of atleast one of them is 1. 
== evalSegs [evallndex] . segString { 1 ] ) 
initialOnlnitialMatchScore; 

else 

match, but one was more than a single character 



// does the first char match ? 
if (qSegs [qlndex] .segString (0] ~ 



// if the second char 
h, 

// since we know the length 
if (qSegstqIndex] .segStringdJ 
diScore = 



initScore; 



// 



so assign initScore 
} 



// 

diScore « 



initial 



else 



0.0; 



diScore - 

no match at all, since first char was off 



} 

else 



// 
// 



or we shouldn't check them 

have unknowns, variants, or initials, 

comparison. 

diScore = 

NH_digraph_score (qSegs [qlndex] .segString, qSegLen, 
evalSegs [evallndex] .segString, evaiSegLen, 



// else not initials 
when here, we do not 
so do a digraph 



) 

(neither name is blank) 

segment parameters, 
when the segments 



} 

// 



leftDigraphBias) ; 

// end, if (areVariants =- false) 
end, else, both segs are known 



Here we need to handle the oops and anchor 

oops specifies a factor to multiply by the score 

are not in the same position. 

AnchorSeg, AnchorFactor specify a factor to 

are in the same segment position, but are in a 

the stated AnchorSeg. Note that AnchorSeg does 

average mode, because otherwise a 2 segment name 

an exact match would get less than 1.0, since 



multiply matches that 
segment other than 
not get applied in 
that was 
the segment that 

was not in the anchor segment would be 
penalized. Anchor Factor 

is meant more to provide a penalty when a 

(relatively) 

unimportant segment is used as the sole 

contributor to 

the score. 



applied, since oops only 
and anchorFactor 

alignment. anchorSeg 
the left, ■ while 

right. A value 
left (this is the 

•*/ 



Note that only one of the factors may be 
gets applied to segments that are out of place, 
only gets applied to matches that are in place. . 
AnchorSeg is also used to determine segment 
value 1 indicates segments should be lined up on 
value 2 indicates they should be lined up on the 
of 0 indicates they should be lined -up on the . 
default. 



switch (anchorSeg) { 
case 0 



anchor segment designation 
place, so apply oops 

case 1 

seginent is most important 



if {qlndex evallndex) 

diScore *= oopsFactor; 

break; 



// 
// 

// 



no 

out of 
first 



evallndex) // 

NH_SEGMODE__AVG) ) 
anchorFactor; 

NH_SEGMODE_AVG 

== numEvalSegs - 1)) 
end segments 



if (qlndex != 
out of place, so apply oops 

diScore *= oopsFactor; 

else 

if ((qlndex != 0) && (scoreMode != 
// * if not the first segment (anchor seg) 
diScore *= 

// apply the anchorFactor,. so long as the 
break; 

// scoreMode is not 

case 2 : /* If not last-to-last match... '/ 

if {(qlndex numQSegs - 1) && (evallndex 

; // no modification, since both are 



else 



position, counting back from the end 
(numEvalSegs - evallndex)) 



// 



see if they are in the same 



if ((numQSegs - qlndex) 



NH_SEGMODE_AVG) 
anchorFactor; 



score, 
effect 



} 

// 

// 
// 



if (scoreMode !- 

// skip anchor factor in average seg mode 

diScore •* = 
// apply the anchorFactor 
else 

diScore *= oopsFactor; 

} 

break; 



Now we need to apply the TAQ values to the 

but only if they wanted to, and we have a score 
greater than 0 (otherwise, factors have no 



if ((scoreTaqs) && (diScore > 0.0)) , , ^ , 

NH_appiy_TAQs_to_score(&diScore, &qSegs [qlndex J , 

&evalSegs [evallndex] , 



absDelTAQFactor, absDisTAQFactor, 
delTAQFactor, disTAQFactor) ; 

// 



if (numQSegs > numEvalSegs) 

smaller dimension as -"^^^^^^t^^,,, (,;3ii„dexl [qlndex] - diScore; 

else . , .: - 

scoresTable [qlndex] [evallndex] « diScore; 



always store 



hiScore = hiScore > diScore ? hiScore : diScore; 



) // for evallndex 
} // for qlndex 



// 
// 
// 

// 
// 

^' ■ / ■/ 
if 



now figure out a composite score from all the best scores 
Note that for Best score, we must set the number of segments 
that were scored, and fill an array containing those scores 

these will be used later to sort hits) . 
The exception to this is when either the query or the 
eval name field has just 1 segment, in which case we only 
score one segment, which becomes the score (in all modes) . 



note that we only 



always call 



{ {numEvalSegs ==1) II (numQSegs ==1)) { 
if (scoreMode == NH_SEGMODE_HIGHEST) { 
*numSegsScored =1; // 

scored 1 segment 

bestSegScores [0] = hiScore; // save the 

singly scored segment 
} 

returnValue « hiScore; 

} 

else { 

// both have more than 1 segment 
if (numQSegs > numEvalSegs) { // 
functions with smaller dimension as rows 

if (scoreMode =- NH_SEGMODE_HIGHEST) ( 

NH_best_score~for_highest_mode (numEvalSegs, 
numQSegs, hiScore, bestSegScores, scoresTable) ; 

*numSegsScored = numEvalSegs; // note 

that we oniy scored numEvalSegs segments 

returnValue ~ hiScore; 

} 

else 

returnValue = NH_best_score (numEvalSegs, 
numQSegs, scoreMode, scoresTable); 
1 

else { 

if (scoreMode NH_SEGMODE_HIGHEST) { 

NH_best_score_for_highest_mode (numQSegs, 

numEvalSegs, hiScore, bestSegScores, scoresTable); 

*numSegsScored = numQSegs; // . note 

that we only scored numQSegs segments 

returnValue = hiScore; 

} 

else 

returnValue - NH_best_score (numQSegs, 
numEvalSegs, scoreMode, scoresTable) ; 
} 

) 



If 

names . 

If 

function. 
// 

// 

returnValue . 



here we need to see if we are suppoed to checlc compressed 

if so, we have to call the NH_checlc_compressed_name () 

If that function returns tru^, we pick the higher of the 
compressedScore (which is a parameter) and the current 



if (checkCompressedName && 

WH__check_compres sed_naine ( or igQName Field, 

origEvalNameField, 

nameParms->getSegmencBreakChars ( ) , 

nameParms->getNoiseChars { ) ) ) 
returnValue = returnValue > compressedNameScore ? 
returnValue : compressedNameScore; 

return returnValue; 

} /* NH calc score */ 



/* NH__check_compressed_name 

Compresses both names passed in, and sees if they are exact 
matches. 

The compression is implemented by skipping characters specified in 

compressChars . 

bool NH_check_compressed_name (char "qSegString, char * evalSegString, 
char *compressCharsPartl, 

char *compressCharsPart2) 

{ 

•char compressedQuerySeg [NH_MAX_SEG_LENGTH + 1]; 

char compressedEvalSeg[NH_MAX_SEG_LEMGTH + 1}; 

char compressChars {200 + 1]; 
char *p; 
char *q; 



// first, combine the compressCharsPartl and compressCharsPartl 
strings 

strcpy (compressChars, compressCharsPartl) ; 
strcat {compressChars, compressCharsPart2 ) ; 

// compress the query segment 
for (p = qSegString, q = compressedQuerySeg; *p != EOS; p++) 
if (strchr (compressChars, *p) =- NULL) 

*q++ = *p; 
*q - EOS; 

// compress the query segment 
for {p = evalSegString, q = compressedEvalSeg; *p !- EOS; p++) 
if (strchr (compressChars, *p) == NULL) 

*q++ = *p; 
*q = EOS; 

// at this point, we are not necessarily upper cased, so ignore 

case 

// during the string copy 

return ! strcasecmp (compressedQuerySeg, compressedEvalSeg); 



} /* NH check compressed name */ 



/* NH_best_score 

From a matrix -ot^-scoxes compute the highest possible 

combination 

of scores. During the evaluarion of the matrix, a given row 

or 

column must provide one and only one score. 

We use a mode to determine how we calculate a score. The 

mode 

can be either NH_SEGMODE_AVG or NH_SEGMODE_LOWEST . Both 

mod^^.,,.. 

start out by selecting the combination of values (with no 

row or 

column being used more than once) that gives the highest 

sum. Then, 

for mode = NH_SEGMODE AVG, the final score is the average of 

all 

these scores. For NH_SEGMODE_LOWEST, it is the worst of 
these scores. 

If the matrix is non-square (x <> y) , our final score 

calculation 

only includes N valueis, where N is the lesser dimension. We 

still 

use all the possible squares in the matrix to perform our 

selection, 

but the final score does not consider part of the matrix. 
To perform the work, we figure out which type of matrix we 

are 

dealing with (the dimensions) . We use that to select an 
array that contains 

the column indexes for each valid combination of segments 

(where 

valid means no column participates twice) . 

Our matrix always comes either as a square, or as a fat,- 
short matrix. 

That is, the number of rows is always less than or equal to 

the number of 

columns. This way, we do not have- to specify as many . 
combination arrays, 

since we only have to cover a 2 X 3 array, a"nd not a 3 X 2. 

Also, before this function, we see if either name has just 1 
segment, in which case we use the best score. 

*/ 

double NH_best score (int xDim, int yDim, NHSegScoreMode scoreMode, 

" double 
scores [NH_MAX_SEGS_AFTER TAQ] [NH_MAX_SEGS_AFTER_TAQ] ) 
{ 

byte *comboIndexesPtr; // points to array that 

holds valid column index combos 

int numCominations; 

switch (xDim) ( , • 

case 2:. 

switch (yDim) { 

case 2: // 2 by 2 

comboIndexesPtr = twoByTwo; 



numCominations - 2; 
break; 

case 3: _// 2 by 3 

comboIndexesPtr = twoByThree; 

numCominations = 6; 

break; 

case 4: // 2 by 4 

comboIndexesPtr = twoByFour; 
numCominations - 12; 
break; 

case 5: // • 2 by 5 

comboIndexesPtr = twoByFive; 
numCominations = 20; 
break; 

default: // must be greater than 5, 

so just use first five 

comboIndexesPtr « twoByFive; 
numCominations - 20; 
break; 

} 

break; 
case 3: 

switch (yDim) { 

case 3: // 3 by 3 

comboIndexesPtr = threeByThree; 

numCominations - 6; 

break; 

case 4: // 3 by 4 

comboIndexesPtr = threeByFour; 

numCominations =24; 

break; 

case 5: // 3 by 5 

comboIndexesPtr = threeByFive; 
numCominations =60; . 
break; 

default: // must be greater than 5, 

so just use first five 

comboIndexesPtr = threeByFive; 

numCominations =60; 

break; 

} 

break; 
case 4: 

switch (yDim) { 

case 4 : // 4 by 4 

comboIndexesPtr = fourByFour; 

numCominations » 24; 

break; 

case 5: // 4 by 5 

comboIndexesPtr = fourByFive; 
numCominations = 120; 
break; 

default: // must be greater than 5, 

so just use first five 

comboIndexesPtr = fourByFive; 
numCominations » 120; 
break ;^ 

} 

break; 
case 5: 

switch (yDim) { 



case 5: // 5 by 5 

comboIndexesPtr = fiveByFive; 
numCominations = 120; 
break; 

default: // must be greater than 5, 

so just use first five 

comboIndexesPtr = fiveByFive; 
numCominations = 120; 
break; 

} 





break; 




default: // must be greater than 5, so just use 


first* five 






// also, since xDim 


is <- yDim, 


we do not have to 


// handle 5X2, 5 X 


3, etc 






comboIndexesPtr - fiveByFive; 




numCominations = 120; 


) 


break; 


// 


we always use xDim matrix cells to compute the score, since 


it 




// 


is the smaller of the dimensions. We go t-hrough each 


combination 




// 


and evaluate the scores found in the scores array for the 


// 


particular combination of indexes. 


// 


Each evaluation must consider xDim values, so each pass 


through the 




// 


loop collects xDim values. 


// 


The values from the comboIndexesPtr array are the column 


indexes . 




// 


numCominations is the number of times we iterate through the 


loop to 


look at a combination of elements in the score matrix. 


// 


// 




II 


For example: 


II 


if I have a 2 X 3 matrix, I need to find the best valid 2 


segment 


combination {since 2 is xDim) . There are 6 possible 


// 


combination 


s, . - - 


II 


and the column values are stored as pairs in the twoByThree 


array. 




II 


The row values are implicitly 0 and 1 for each pair, so I 


end up 


checking: ■ 


II 


II 


scores [0] [twoByThree [0]] + 


scores [ 1 ] ( twoByThree ( 1 ] ] ; 


II 


scores [0] [twoByThree [2] ] + 


scores [1] [twoByThree [3] ] ; 


II 


scores [0] [twoByThree [4] ] + 


scores [1 J [twoByThree f 5] ] ; 


II 


scores [0] [twoByThree [6] ] + 


scores [ 1 ] [ twoByThree [ 7 ] ] ; 


II 


scores [0] [twoByThree [8] 1 + 


scores [1} [twoByThree [9]] ; / 


// 


scores [0] [twoByThree [1-0] ] + 


scores [ 1 ] [twoByThree [11]]; 


■ // 





double tempScoreTotal; 



double tempLowScore; _ . 

double tempVal; 

double highestTotal = 0.0; 

double bestLowScore = 0.0; 

int comboArraylndex - 0; 

int i, row; 

for (1=0; i < numCominat ions; { 
tempScoreTotal = 0.0; 
tempLowScore = 1»0; 

for {row = 0; row < xDim; row++) { 
-Mir'-w // get a single score 

tempVal = 

scores [row] [comboIndexesPtr (comboArraylndex] ] ; 

// now see if score is the low score for this combo 
if (tempVal < tempLowScore) 

tempLowScore - tempVal; 



combination 



next combination) 



// include this cell in the total for this 
tempScoreTotal += tempVal; 

// look at next item in the combination (or the 
comboArrayIndex++; 



) 

// see if the low score is better than oar previous low 

if (tempLowScore > bestLowScore) 

bestLowScore = tempLowScore; 
// see if this score is higher than our previous highest 
if (tempScoreTotal > highestTotal) 

highestTotal - tempScoreTotal; 

} 

if (scoreMode == NH_SEGMODE_AVG) 
return highestTotal / xDim; 

else 

return bestLowScore; 



/* NH best_score_for_highest_mode 

~ This is a special version of NH_best_score . For a complete 

description of how the combination stuff wor)cs, see the 

comments 

for NH__best_score. 

We made this a separate function because: 

it has to return (by reference) an array of 

scores. The other 

modes only have to return a score for the name. 
The way we figure out which array of scores to 

much more involved than NH_best_score. 
Since we only do this stuff in highest mode, we 
did not / 

want to slow down the processing of 

NH_best_score by passing 

extra parameters and adding lots of "if" 

statements. 



return is 



score 

combination 
highest 

of 

scores, 

first 
we 
the 

of 

looking 
it, as 
array, 
the 

which 
could 
discount 
sorting 
were 
why 

sure 



This function was added so that we can figure out which 
combination of segments gives us the highest scores, with 
the highest score being most important, the next highest 

being the second most important, etc. Note that this is . 
different from average score, where we are looking for tne 
highest sum of scores. In that case, the higest score is no 
more important that the lowest score. In fact, the 

chosen in average mode might not even include the single 
segment score. 

To achieve our goal, we evaluate each possible combination 
index pairings. Each combination gives us an array of N 
where n is the smaller dimension in the matrix. 
We sort each combination so that the highest score appears 
in the array, and so on. If this is the first combination 
have evaluated, it becomes the one to beat, so we fill up 
array of scores we were passed by reference with this array 
scores. We then go through the rest of the combinations 
for an array that beats the current one to beat. To beat 
we walk through the array, we compare the scores from each 
If they are equal, we move on to the next one. Otherwise, 
higher score wins. 

To help speed things up, we get passed in the high score, 
is the high score of the entire matrix (note this high score 
appear more than once). We use this high score to -quickly 
combinations as not being possible contenders. If, after 
a contender array, the first item is not the high score we . 
passed, this combination could not possibly be the one, so 
bother copying all the array elements? 

Note that we check before entering this function to make 

both dimensions are bigger than 1. And we make sure that 
xdim is the smaller of the dimensions (or they are equal). 



void MH_best_score_^for_highest_mode(int xDim, int yDim, double 
highestScore, 

*bestSegScores, 



double 



scores [NH_MAX_SEGS_AFTER_TAQ] [NH^MAX 
{ 

byte *coinboIndexesPtr ; 
holds valid column index combos 

int numCominations; 



double 

_SEGS_AFTER_TAQ] ) 

// points to array that 



switch (xDim) { 
case 2: 

switch (yDim) { 

case 2: // 2 by 2 

comboIndexesPtr = twoByTwo; 
' numCominations = 2; 

break; 

case 3: // 2 by 3 

comboIndexesPtr = twoByThree; 

numCominations = 6; 

break; 

case 4: // 2 by 4 

comboIndexesPtr = twoByFour; 
numCominations = 12; 
break; 

case 5: // 2 by 5 

comboIndexesPtr = twoByFive; 
numCominations = 20; 
break; 

default: // must be greater than 5, 

so just use first five 

comboIndexesPtr = twoByFive; 
numCominations = 20; 
break; 

} 

break; 
case 3: 

switch (yDim) { 

case 3: // 3 by 3 

comboIndexesPtr - threeByThree; 

numCominations =6; 

break; 

case 4 : // 3 by 4 

comboIndexesPtr = threeByFour; 

numCominations =24; 

break; 

case 5: // 3 by 5 ' 

comboIndexesPtr = threeByFive; 

numCominations =60; 

break; 

default: // must be greater than 5, 

so just use first five 

comboIndexesPtr = threeByFive; 

numCominations = 60; 

break; 

) 

break; 
case 4 : 

switch (yDim) { 

case 4: , // 4 by 4 

comboIndexesPtr = fourSyFour; 

numCominations = 24; 

break; 

case 5: // 4 by 5 



comboIndexesPtr = fourByFive; 
numComi nations- = 120; 
-break; 

default: // must be greater than 5, 

so just use first five 

comboIndexesPtr = fourByFive; 
numComi nations = 120; ' 
break; 

) 

break; 
case 5: 

-Vif-*. switch (yDim) { 

case 5: // 5 by 5 

comboIndexesPtr = fiveByFive; 
numCominations = 120; 
break; 

default: // must be greater than 5, 

so just use first five 

comboIndexesPtr = fiveByFive; 
numCominations - 120; 
break; 

) 

break; 

default: // must be greater than 5, so just use 



first five 

is <= yDim, we do not have to 
3, etc 



// also, since xDim 
// handle 5X2, 5 X 



comboIndexesPtr = fiveByFive; 
numCominations - 120; 
break; 

} 

// we always use xDim matrix cells to compute the score, since 

it 

// is the smaller of the dimensions. We go through each 
combination 

// and evaluate the scores found in the scores array for the 
// particular combination of indexes. 

// Each evaluation must consider xDim values, so each pass 
through the 

// loop collects xDim values. 

// The values from the comboIndexesPtr array' are the column 
indexes . 

// numCominations is the number of times we iterate through the 
loop. to 

// look at a combination of elements in the score matrix. 

// 

// For example: 

// if I have a 2 X 3 matrix, I need to find the best valid 2 

segment 

// combination (since 2 is xDim) . There are 6 possible 
combinations, 

// and the column values are stored as pairs in the twoByThree 

array. 

// The row values are implicitly 0 and 1 for each pair, so I 

end up 

// checking: 

// scores [0] [twoByThree [0] ] + 

scores [ 1 H twoByThree I 1 ]] ; 



scorestO] [twoByThree[23 ] 

scores [1] [twoByThree (3] ] 
// 



scores [1] [twoByThree [5] ] 
// 

scores [ 1 ] I twoByThree [ 7 ] ) 
If 



saares-[0] (twoByThree [ 4 ] ] + 
scores [0] [twoByThree [6] 1 + 
scores [0] [twoByThree [81 ] + 



scores (11 [twoByThree [9] ] , . 

y scores [03 [twoByThree [ion + 

scores [1] [twoByThree [11] ] ; 

■^..^ double tempSegScores [NH_MAX_SEGS_AFTER_TAQ1 ; 

int comboArray Index = 0; 

int i/ row; 

bool includesHighestScore; 
double swapVal; 
int templndex; 
double compVal; 
int numChanges ; 

double tempVal; 

// init the temp seg scores array to zeros, so that the first 
// comparison will fail. 

for (templndex = 0; templndex < xDim; templndex++) { 
bestSegScores [templndex] = 0; 

} 

for (i «= 0; i < numCominations; i++) { 

IncludesHighestScore = false; // assume this <:ombo does 

not 

// include the best score 
for (row= 0; row < xDim; row++) { 
// get a singlje score 
tempVal = 

scores [row] [comboIndexesPtr [comboArraylndexl ] ; 

// now see if score is the low score for this combo 
if (tempVal == highestScore) 

includesHighestScore = true; 

// save this value as part of our temp array of 

scores , , " 

// that we will sort below 
tempSegScores [row] = tempVal; 

// look at next item in the combination (or the 

next combination) 

comboArrayIndex++; 

// see if this combo includes the best score. If so, 

sort it . 

// and then compare it to the current numbers in 

bestSegScores. 

if (includesHighestScore — true) ( 

// sort the nui^ers in bestSegScores 
while (1) { 

numChanges =0; 

for (templndex = 1; templndex < xDim; 

templndex++) ( 



tempSegScores [templndex] ) 
1]; 

tempSegScores [templndex] ; 



if (tempSegScores.! templndex - 1] < 



( 



swapVal = tempSegScores [ templndex - 

tempSegScores [templndex - 1) = 

tempSegScores [templndex] = swapVal; 
numChanges++; 



) 

// 



best scores 
templndex++) 



if (numChanges == 0) 
break; 



now compare these temp scores to the current 



for {templndex = 0; templndex < xDim; 
{ 

corapVal - tempSegScores [templndex] - 
bestSegScores [templndex] ; 

if (compVal > 0) { 

// temp scores are better, so replace 



the best scores with them 
templndex++) { 
tempSegScores [templndex] ; 



breaJc out 



} 

else 



for (templndex = 0; templndex < xDim; 
bestSegScores [templndex] - 

1 

break; 



if (compVal < 0) { 

// current scores are better, so 



} 

// 



break; 

otherwise, just continue the loop. 



/* digraph_score 

This is the core of the name-check algorithm. 

A value from 0.0 to 1.0 is calculated based on the number of 

digraphs which match between the two given strings. 

A bias can be used so that digraphs on the right end of the 

strings count less than those on the left. 

Notes : 

The routine ensures that a digraph can only participate in 
match once. 



The 

number 



Each match results in two ppints being added to the total, 
final score is the total number of points divided by the 
of digraphs that could have matched. 



the 

normally 
the case 



The bias works by discounting the score we award for a 

digraph _ , - _ . 

match. As we move into the segment, we subtract 0-1 rrom 

match score. • • 

The weight table is used to adjust the divisor (which is 

the total number of digraphs that could have matched) . In 

of bias, we need to decrease that number. Otherwise, an 
exact match 

-*i.r-'-. would not return a 1.0, since we would only be deducting 

from the 

score (the numerator), and not the divisor. The weight 
table factors 

correspond to the score that v/ould be assigned to an exact 

match for , ^. 

each possible length. In other words, we start at l,.then 

add ,9, then ^ ^ ^ ^. 

add .8, etc. (the same pattern we use to deduct from the 

match score) 
*/ 

double NH_digraph_score (char *qSeg, int qSegLen, 

char *evalSeg, int evalSegLen, 

bool useLeftDigraphBias) 

^ . . char tempDigraphStr[2 + 1]; // storage for a digraph string 

// terminate the temp digraph string once 

tempDigraphStr [2] = EOS; 

// These are the weights a name has when using a biased 
// {left-to-right ) calculation. They end up being used as the 
denominator 

// for the final score calculation 

static const double NH_dig_bias_weights [NH_MAX_SEG_LENGTH + 2] 

= ( 1.0, 1.0, 1.9, 2.7, 3.4, 4.0, 4.5, 4.9, 5.2, 5.4, 5.5, 

5.8, 5.9, 6.0, 6.1, 6.2, 6.3, 6.4, 

6.5, 6.6, 6.7, 6.8, 6.9, 7.0, 

' 7.1, 7.2, 7.3, 7.4, 7.5, 7.6); 

// an array of 'Y' or *N' values, one for each possible digraph 
// position in the eval segment. Each starts out at 'N' and 

// to 'Y' when (and if) it gets used. 

// Note that w^ must add 1 because we normally pad the name 



.gets 
with 



// spaces. 

char alreadyMatched[NH_MAX_SEG_LENGTH + 1]; // max digraphs - 
NAME_SIZE + 1 

// Forget all previous matches. 

memset (alreadyMatched, 'N', sizeof alreadyMatched); 

// Now count the number of elements involved in matching. 

double qBiasFactor =0.9; //' 0.9 because 



°' ^^iruLf'"''eea!Biasrac.o. - 0.9; // see note oelo. 

double . matchPoints; 
char *evalSegString; 



// start out by checking the first. character, which is a 

^^^""^^/Z case. -It forms an implied digraph of " X" (space, followed 

// the character. Thus, if both the query and eval have the 

^^^^ // first character, we give them 2 match points. 

"^-ry- Also, since we really start our loop with the second 

digraph^ we set the bias factors to 0.9 rather than 1.0 
if (qSegtOl evalSeg[0]) { 
matchPoints =* 2.0; 

} 

else 

matchPoints =0.0; 

for (int querylndex - 0; querylndex < qSegLen - 1; ++querylndex) { 
/* see if this digraph occurs in database name / 
tempDigraphStr[0] « qSeg [querylndex] ; 
tempDigraphStr[l] = qSeg [querylndex +1]; . 
evalSegString = evalSeg; 

if (useLeftDigraphBias) { ^ , ' • 

// bring down the query bias by 0 . 1 each time, 

until we get to 0.1 , ^ ^ mvv 

if ((querylndex > 0) && (querylndex < 10)) 

qBiasFactor -=0.1; 

} 

^ evalSegString = strstr (evalSegString, tempDigraphStr) ; 
if (evalSegString != NULL) I 

int evalMatchOffset = evalSegString - evalSeg; 

if (alreadyMatched[evalMatchOffsetl — 'NM ( 
alreadyMatched [evalMatchOffset] = 'Y'; 
if (useLeftDigraphBias) { /* decrement 

eval match-bias, minimum 0.10*/ in ni* 

evalBiasFactor = 1.0 - 0.1 

(evalMatchOffset +1); . ^ ^ ^ n ^^ ' ' 

vevaxiiau { evalBiasFactor < 0.1) 

evalBiasFactor = 0.1; 

matchPoints += qBiasFactor + 



evalBiasFactor; 



else 



1 

else 

matchPoints +» 2-.0; 

break; 

1 



evalSegString++; 
} 

) while (evalSegString NULL); 

) 

// now do a check for the "hidden" digraph at the end of the 
segment 



// to account for the non-existant trailing space 
if (qSeg [qSegLen - 1] == evalSeg [evalSegLen - 1]) { 
if (useLef tDigraphBias) { 

evalBiasFactor = 1.0 - 0.1 * evalSegLen; 
if (evalBiasFactor < 0.1) 

evalBiasFactor = 0.1; 
// don't forget to bring down the query bias by 0.1 

also, 

// unless we are at 0.1 

if ( (querylndex > 0) && (querylndex < 10) ) 

qBiasFactor -= 0,1; 
matchPoints += qBiasFactor + evalBiasFactor; 

} 

else 

matchPoints += 2.0; 

} 



// The return value is the number of elements involved in matching 
// compared to the total number of elements, 
return useLef tDigraphBias 

? matchPoints / 

(NH_dig_bias_weights [qSegLen + 1) + NH__dig_bias_weights (evalSegLen + 1]) 

: matchPoints / (qSegLen + evalSegLen + 

2); 

} /* NH_digraph_score ♦/ 



/* 

This function adjusts the diScore (which already has some value) 

based 

on the TAQ values that are attached to the two segments passed in. 

In the NameHunter system, TAQs are broken up into two types 
(disregard and 

delete). In general, disregard TAQs (e.g. "Jr.") contain more 
meaningful information than delete TAQs {e.g. "Mr.")-, and thus 
disregard TAQs are considered more important when 
evaluating/comparing 

TAQs between segments. 

There are three factors involved in modifying the score.- These. 

are 

delete factor 
disregard factor 
absent factor 

When applied, a factor is multiplied by the existing score. 
However, 

deciding which factor (if any) to apply is somewhat complex, 

especially 

when one or both of the segments have multiple TAQ values. For 

this 

reason, we describe the multi-TAQ situation separately. 

For situations where both segments have either 0 or 1 TAQ values, 

we 

■use the following matrix to choose a factor to apply: 



I No 

TAQ ! Delete TAQ I ■ Disregard 

TAQ I 

I I I 

No TAQ I No Change t Absent 

Factor I Absent Factor I 

I 

I 

I I 

' ■>cir-... I ' " ' 

Delete TAQ | Absent | Delete 

Factor I Absent Factor ! 

I Factor 

I Unless 
same I 
I 

I I I 

Disregard TAQ I Absent | Disregard 

Factor I • Absent 

Factor I 

I 

I Factor 

I Unless 
same I 

I t 

, I I 



For the multiple case, we use the algorithm below. A general word 
about the alg - we are treating disregard as more important than 
delete, so we start out by checking for- disregards . All it takes 
is for one disregard value in each of the segments to match to 
avoid applying the disregard factor. The same goes for deletes. 
If we have any dis values in one segment, but none in the other, 
we apply the absent factor. 

Assuming segments SI and S2 : 

Look for dis segments in SI 
if found 

if same segment found in S2 

go on to delete processing . 

else 

if no dis segments in 52 
apply absent value 
else, continue looking for dis segments in 

51 that match S2 

if we get to end of SI segments and still 

have 4iot found a 

matching piis in S2, apply dis factor, 
else (no dis found in SI) 
look for dis in S2 
if found 

apply absent 



else 

go on to delete processing 



Delete processing: 



look for deletes in SI 
if found 

if same seg found in S2 
do nothing 

else 

if no deletes in S2, 

apply absent 

else 

continue to look for deletes in SI. 



If we get to end if 
deletes that match a 



SI segments and do not find any 
delete in 32, apply delete factor 



else (not deletes found in SI) 
look for deletes in S2 
if delete found 

apply absent 

else 

do nothing. 

V 

void NH_apply_TAQs_to_score (double *diScore, Segment *qSeg, Segment 
*evalSeg, 

double absDelTAQFactor, 
double absDisTAQFactor, 
double delTAQFactor, 
double disTAQFactor ) 

{ 

int numQTAQs = qSeg~>numTAQs ; 

int numEvalTAQs = evalSeg->numTAQs; 

double applyFactor - 1.0; 

// handle the simple case first 

if ( (numQTAQs <= 1) && (numEvalTAQs <= 1) ) ( * 

switch (numQTAQs) { 

case 0: 

if (numEvalTAQs ==1) { 

if (evalSeg->taqList [0] . taqAction 

NH_TAQ_ACTION_DELETE ) 

applyFactor = absDelTAQFactor ; 

else 

applyFactor = absDisTAQFactor; 

) 

break; 
case 1: 

if (numEvalTAQs ==1) { 

// both segs have 1 TAQ value, so 

// figure out the type of TAQs involved 

if (qSeg->taqList (0] . taqAction == 

NH_TAQ_ACTION_DELETE) . { 

^ if (evalSeg->taqList [0] . taqAction == 



NH_TAQ_ACTION_DELETE) { 

// same action, so see if 

string are the same 

if (strcmp(qSeg- 

>taqList[0] .segString, 

evalSeg->taqList {0] .segString) ) 

applyFactor = 

delTAQFactor; // they were different, so apply delete 

factor 

} 

else // not the same 

action, so do the absent 

applyFactor = absDisTAQFactor ; 

} 

else { // not 

NH TAQ ACTION DELETE, so must be 

- — — // disreg 

ard 

if {evalSeg->taqList [0] . taqAction =« 

NH_TAQ_ACTION_DISREGARD) { 

// same action, so see if 

string are the same 

if {strcmp(qSeg- 

>taqList[0] .segString, 

evalSeg->taqList [0] . segString) ) 

applyFactor = 

disTAQFactor; // they were different, so apply dis 

factor 

} 

else // not the same 

action, so do the absent dis 

applyFactor = 

absDisTAQFactor; // since dis takes orecidence of del 

} 

} 

else { // query had 1 TAQ, but eval had 

if {qSeg->taqList [0] . jtaqAction »= 

applyFactor - absDelTAQFactor; 



none 

NH TAQ ACTION_DELETE) 



value 
case, 

processing 
the 



else 

applyFactor = absDisTAQFactor; 

1 

break; 

} 

) 

else { . 

• // one {or both) of the segments has more than 1 TAQ 



// First see if either has no TAQ segments. In this 

// we can apply the absent factor and skip the ugly . 

// below , 
if (numQTAQs == 0) ( 

// assume the abs del factor, but look for a DIS in 

// eval. If we find one, set the applyFactor to 



the abs dis 

// since that should take precidence 
applyFactor = absDelTAQFactor ; 

for (int evallndex = 0; evallndex < numEvalTAQs; 
evallndex++} { 

if (evalSeg->taqList [evallndex] . taqAction == 
NH_TAQ_ACT I ON_DI SREGARD ) { 

" applyFactor = absDisTAQFactor ; 

break; 

) 

} 

} 

■-^i-.;^.^, else if (numEvalTAQs ==0) { 

// assume the abs del factor, but look for a DIS in 



// 



query. If we find one, set the applyFactor to 



// since, that should take precidence 

applyFactor - absDelTAQFactor; 

for (int qlndex = 0; qlndex < numQTAQs ; 



the 

the abs dis 

qlndex++) { 

if (qSeg->taqList [qlndex] . taqAction == 
NH_TAQ_ACTION_DISREGARD) { 

applyFactor = absDisTAQFactor; 
break; 

} 

} 

1 

else { 

// one segment has 2 or more TAQs, and the other 

has one or more 



satified the 



bool satisfiedDis 

// 
// 
// 



true; 



we assume we have 



dis processing until we find 
a dis value, since if neither 
seg has a dis value, we do not 



satified the 



satified the 



// apply the dis value 
bool satisfiedDel - true; // we assume we. have 



// del processing until we find 

// a del value, since if neither • 

// seg has a del value, we do riot 

// apply the del value 
bool satisfiedAbs = true; // we assume we have 



// abs processing, 
bool foundMatchingDis = false; 
bool foundMatchingDep. - false; 



// go through the query segment, looking for dis 

segments . ^ , 

for (i = 0; i < numQTAQs; 1 

if (qSeg->taqList(il .taqAction == 

NH TAQ ACTION DISREGARD) { . . ^. ^ ^Ar.A 

- - - // since we found a dis, we must find a 



dis in the eval se.g. 



NH_TAQ_ACTION_DI SREGARD ) 
not dealing with an absent 
segs are the same. 

>taqList ti] . segString, 

evalSeg->taqList [ j] . segString) ) 



satisfiedDis - false; 
satisfiedAbs - false; 

// look for disregards in the eval seg. 
(j = 0; j < numEvalTAQs; j++) { 
if (evalSeg->taqList [ j] .taqAction 



for 



// found a dis, so we are 

// situation - see if the 

satisfiedAbs = true; 
if { !strcmp(qSeg- 



foundMatchingDis = true; 
satisfiedDis = true; 
break; 



been satified, we 

not find any dis in the 



} 

// 
// 



if we get here, and the abs has not 
apply the abs factor, since we did 



// eval, but did find one in the query, 
if (satisfiedAbs == false) I 

apply Factor = absDisTAQFactor ; 

// mark the DIS as satisfied so 



that we do not 

when seeing if DEL was satisfied. 



the dis. If we did, we can 



// re-assign the factor below 
true; 



satisfiedDis 
break; 



) 

else { 
// 



check to see if we satisfied 



// go check out the delete stuff, 
if (satisfiedDis true) 
break; 



TAQs while looking 



- so go on 



} // end for query TAQ 

// once here, we made it to the end of the query 

// for disregards.. This means either: 

// .we found no disregards in the query 

// and see if there are any 



disregards in the 


Eval 






// 


we found disregards in Q, but none 


in Eval - we 








// 


apply the absent factor, and 


we're done 








// 


we found dis in Q, but no matching 


ones in Eval - we 








// 


apply the disregard factor. 


and we ' re done 








// 


we found a matching dis in Q and 


Eval - so do 


deletes . 






// 


we can skip the check for 


disriegards in Eval, since 






// 


we already knovi/ there is a 


match. 








// make 


sure we should continue 



if (satisf iedAbs && satisf iedDis) 
// 

if ( foundMatchingDis == false) 



{ 



no Dis Values. 



NH TAQ ACTION DISREGARD) 



7/ VJe are in this section if the Q had 

// see if there are dis values in Eval. 
for (j =0; j < numEvalTAQs; j++) { 

if (evalSeg->taqList [ j 1 . taqAction == 

{ 

- applyPactor = absDisTAQFactor ; 
satisfiedAbs - falser- 
break; 



} 

// see if we should still continue after 
checking for reverse absent 

if (satisfiedAbs) { 

// when here, we got passed checking 
for the dis, so we need to check for 

deletes . 



looking for del segments 



// 
// 
for 



NH__TAQ_ACTION_DELETE ) { 

must find a del in the eval seg. 



go through the query segment; 

(i = 0; i < numQTAQs; i++) • i 
if (qSeg->taqList [ i].. taqAction = 



// 



since we found a del, we 



satisfiedDei = false; 
satisfiedAbs « false; 



// 
for 



eval seg. 

j++) { 

>taqList [j ]. taqAction == NH_TAQ_ACTION_DELETE) 
so we are not dealing with an absent 
see if the segs are the same, 
true; 



look for deletes in the 
(j = 0; j < numEvalTAQs; 
if {evalSeg- 

// found a del, 
// situation - 
■ - satisfiedAbs » 



>taqList [i] . segString, 

evalSeg->taqList [ j ] . segString) ) 

gDel = true; 
- true; ■ 



if ( ! strcmp (qSeg- 
{ 

foundMatchin 
satisf iedDel 
break; 



} 

// 
// 
// 



abs has not been satified, we 
since we did not find any del in the 
in the query, 
false) ( 
absDelTAQFactor; 
satisfied so that we do not 

factor below when seeing if DEL was satisfied 
true; 



if we get here, and the 
apply the abs factor, 
eval, but did find one 



if {satisfiedAbs -== 
applyFactor = 
// mark the DEL as 
// re-assign the 
satisfiedpel = 
break; 



satisfied the del. 
true) 



If we did, were done 



} 

else { 

// 



check to see if we 
if (satisf iedDel 
break; 



end for query TAQ 



// make sure we should continue 
if {satisfiedAbs satis f iedDel ) " { 
if ( f oundMatchingDel == 

// We are in this section 

// see if there are del 

for (j - 0; j < numEvalTAQs; 



false) { 

if the Q had no Del Values, 
values in Eval. 

{ 

if (evalSeg- 

>taqList [ j] . taqAction == NH_TAQ_ACTION_DELETE) { 

applyFactor » 

absDelTAQFactor; 

satisfiedAbs - 

false; 

' * break; 



// . decide the factor based on the condition that 

was not satisfied 

// except for abs, in which case we already set the 

applyFactor 

// above 

if (satisf iedDel == false) 

applyFactor = delTAQFactor ; 
else if (satisf iedDis == false) 

applyFactor = disTAQFactor ; 

} 

} 



// apply the factor we decided on 
*diScore *- applyFactor; 

} 



// DigraphBitmapArray.hpp : header file 
// 

// Class that holds the bit patterns for each possible 

// digraph {AA - ZZ) . We also need to account for soaces. 

// 

// Each bit pattern turns on just 1 bit. We basically turn 

// on one bit, and shift it through the value until it reaches 

// the other end, at which time we start back at the beginning 

// again. 

// . 

// Any other character are treated as spaces in our scheme, 

// so we do not need to worry about them. 

// ' 

// The class supports either a 32 bit value, or a 64 bit value. 
//////////////////////////////////////////////////////////////////////// 

nil/' 

#ifndef DIGRAPHBITMAPARRAY_HPP 
#define DIGRAPHBITMAPARRAY HPP 



// How many indexes do we need in our two dimensional array? 
// 27 {26 letters plus a space) 

#define BITMAP ARRAY INDEX SIZE 27 



typedef struct { 

unsigned int hiBytes; 
unsigned int lowBytes; 

} bit_64_t; 



class NHDigraphBitmapArray 
I 

// Construction 
public: 

NHDigraphBitmapArray {) ; // standard constructor 
-NHDigraphBitmapArray ( ) ; 

unsigned int get32BitKeyForToken (char *token) ; 

void get 64BitKeyForToken (char *token, 

bit_64_t *key) ; 

unsigned char getNumBitsForByte {unsigned char byteVal) {return 
bitTable[byteVal] ; ) 

// Implementation 
protected: 

void buildBitTable( )'; 

// the array that holds the bit map paterns for each possible 

// ■ digraph. Each item in the array is an integer that has 

// one of its 32 bits turned on. 

unsigned , 

int bitMapArray32EBITMAP_ARRAY_INDEX_SI2E) [BITMAP ARRAY INDEX SI 
ZE) ; 



// the array that holds the bit map paterns for each possible 



// digraph. Each item in the array is an integer that has 
// one of its 64 bits turned on. 

bit_64_t bitMapArray64 [BITMAP_ARR.z^V_INDEX_SIZ 

E] [BITMAP_ARRAY_INDEX_SIZE] ; 

unsigned char bitTable [256] ; 



#endif 



// NHDigraphBitmapArray.cpp : implementation file 
// 

// * 3/20/98 EFB Changed names to NH from SN 

tinclude "NHDigraphBitmapArray . hpp" 

#include <stdiq.h> 

#ifdef _DEBUG • 
#define new DEBUG_NEW 
#undef THIS_FILE 

static... char THIS_FILE[] = FILE ; 

#endif 

typedef unsigned char byte; 

//////////////////////////////////////////////////////////////////////// 
///// 

// Constructor. 

// Fills in the values in both of the bitMapArrays {32 bit and 
// 64 bits) . 

NHDigraphBitmapArray: : NHDigraphBitmapArray { ) 
{ 

unsigned int bitmapValue32 =1; 

unsigned int bitmapValue64High = 0; 

unsigned int ■ bitmapValue64Low = 1; 

for (int i = 0; i < BITMAP_ARRAY_INDEX_SIZE; i++) { 

for (int j = 0; j < BITMAP_ARRAY_INDEX_SIZE; { 

// assign the 32 bit value 
bitMapArray32 [i] t j ] = bitmapValue32 ; 

// assign the 64 bit value 

bitMapArray64 [i] [j] .hiBytes - bitmapValue64-High; 
bitMapArray64 [i] [ j 1 .lowBytes - bitmapValue64Low; 

// now shift the values 
bitmapValue32 <<= 1; 

if (bitmapValue32 ==0) . - 

bitmapValue32 = 1; 

if (bitmapValue64Low == 0) { 
bitmapValue64High <<= 1; 
- if (bitmapValue64High == 0) { 
bitmapValue64Low - 1; 

} 

) 

else { 

bitmapValue64Low <<= 1; 
if (bitmapValue64Low == 0) { 
bitmapValue64High = 1; 

) 



buildBitTableO ; 

} 



MHDigraphBitmapArray: : -NHDigraphBitmapArray { ) 

{ 

1 



void NHDigraphBirmapArray: : get 64BitKeyForToken( char -token, bit_64_t 

*key) 

{ 

char *chl; 
-»4i,vGhar *ch2; 

int index 1; 

int index2; 
char spacedToken[200] ; 

// zero out the key we are going to return 
key->hiBytes =• 0; 
key->lowBytes = 0; 

sprintf (spacedToken, " %s token); 

chl = spacedToken; 
if (*chl != 'NO') { 
ch2 = chl + 1; 
while (*ch2 != 'VOM { 
•if (*chl • •) 

indexl * 26; 

else 

indexl = *chl - 'A'; 
if {*ch2 ' ') 

index2 = 26; 

else 

index2 = *chl - 'A' ; 
if {(indexl >= 0) && (indexl < 
BITMAP_ARRAY_INDEX_SIZE) 

&& {index2 >= 0) && (index2 < 
BITMAP_ARRAy_INDEX_SIZE) ) { 

key->hiBytes 1= 
bitMapArray64 [indexl] [index21 .hiBytes; 

key->lowBytes |= 
bitMapArray64 [ indexl 1 tindex2] .lowBytes; 
} 

chl = ch2; 
ch2++; 

) 

} 

} 



unsigned int NHDigraphBitmapArray: : get 32BitKeyForToken( char *token) 

{ 

unsigned int retVal -« 0; 

char *chl; 

char r ♦ch2; 

int indexl; 

int index2; 

char . spacedToken [200] ; 



sprintf (spacedToken, " %s 



token) ; 



chl * spacedToken; — - 
if (*chl '\0') { 

ch2 = chl + 1; 
while (*ch2 != '\0'} { 
if {*chl == ' •) 

indexl = 26; 

else 

indexl = *chl - 'A'; 
if {*ch2 == • ') 
r.^...,. index2 = 2 6; 

else 

index2 = *chl - *A' ; 
if {(indexl >= 0) && (indexl < 
BITMAP ARRAY_INDEX_SIZE) 

&& {index2 >= 0) i& (index2 < 

BITMAP ARRAY INDEX_SIZE) ) 

~ retVal I = bitMapArray32 [ indexl] { index2 ] ; 

chl = ch2; 
ch2++; 

} 

} 

return retVal; 

) 

// build. a table that says how many bit 
// has turned off. 

void NHDigraphBitmapArray: :buildBitTable ( 
{ 

byte tempByte; 
iht i/ j; 

byte bitsTurnedOf f ; 

for (i « 0; i < 256; i++) { 
tempByte = i; 
bitsTurnedOf f = 0; 
for (j = 0; j < 8; j++) I 
if (tempByte £ 1) 
when array says how many I's 

// if { (tempByte & 1) 
this when array says how many O's 

bitsTurnedOff ++; 
tempByte »- 1; 

} 

bitTable[i] = bitsTurnedOff; 

) 

} 



s a byte value 

) * 



== 0) 



use this 
// use 



File: NHCompParms . cpp 
Description: 

Implementation to the NHCompParms class. 



// 
// 
// 
// 
// 
// 
// 
// 
// 
// 
// 

parms to 
// 

NHNaraeParms class. 
// , 3/20/98 
// 



History: 



5/8/97 
3/3/98 



EFB 
EFB 



EFB 



Created 

Changed name of class, and move 

the new 
Changed names to NH from SN 



#include <string.h> 
#include » <stdio. h> 
#include <stdlib.h> 



# include "NHCompParms . hpp" 

# include "NHVariantTable . hpp" 

#include "NHTAQTable.hpp" 

#include "NH_variant_taq_globals . h" 



NHCompParms: : NHCompParms (NHParmsType parmsType) 
{ 

status « NH SUCCESS; 



switch (parmsType) { 

case NH_PARMS_GENERIC: // default 

scoreThresh = 0.6; 
useGnLef tBias = false; 
useSnLeftBias = false; 
matchGnlntial = true; 
matchSnlntial - false; 
gnlnitialScore = 0.85; 
snInitialScore =0.0; 
gnlnitialOnlnitialMatchScore - 1.0; 
snInitialOnlnitialMatchScore « 0.0; 
useGnVariants = true; 
useSnVariants = true; 
fnuScore = 0.60; 
nfnScore - 0,65; 
InuScore - 0.6; 
nlnScore = 0.65; 

gnAnchorSegmentMode = NH ANCHOR_SEG NONE; 
snAnchorSegmentMode = NH^ANCHOR^SEG^NONE; 
gnAnchor Factor = 0.0; *" 
snAnchor Factor = 0.0; 
gnOOPSFactor =0.6; 
snOOPSFactor = 0.6; 
disGnTAQFactor - 0.7; 
absDelGnTAQFactor - 0.9; 



absDisGnTAQFactor = 0.8; 
delGnTAQFactor = 0.85; 
disSnTAQFactor = 0.7; 
absDelSnTAQFactor = 0,9; 
absDisSnTAQFactor = 0.8; 
delSnTAQFactor = 0.85; 
checkGnCompressedName = false; 
checkSnCompressedName - false; 
gnCompressedNameScore = 0.0; 
snCompressedNameScore = 0.0; 
scoreGnTaqs = true; 
scoreSnTaqs = true; 

gnSegmentScoreMode = NH_SEGMODE_AVG; 

snSegmentScoreMode = NH_SEGMODE_AVG ; 

gnScoreThresh = 0.5; 

snScoreThresh = 0.5; 

gnWeight = 0,8; 

snWeight = 1.0; 

break; 

NH_PARMS_ANGLO: 
scoreThresh =0.6; 
useGnLef tBias = false; 
useSnLef tBias = false; 
matchGnlntial = true; 
matchSnlntial = false; 
gnlnitialScore = 0-85; 
snInitialScore » 0.0; 
gnlnitialOnlnitialMatchScore = 1.0; 
snInitialOnlnitialMatchScore = 0.0; 
useGnVariants = true; 
useSnVariants - true; 
fnuScore =0.60; 
nfnScore =0.65; 
InuScore = 0.6; 
nlnScore = 0.65; 

gnAnchorSegmentMode = NH_ANCHOR_SEG_NONE; 
snAnchorSegmentMode = NH_ANCHOR_SEG_NONE ; 
gnAnchorFactor = 0.0; 
snAnchor Factor =0.0; 
gnOOPSFactor = 0.6; 
snOOPSFactor = 0.6; 
disGnTAQFactor = 0.7; 
absDelGnTAQFactor =0.9; 
absDisGnTAQFactor = 0.8; 
delGnTAQFactor 0.85; 
disSnTAQFactor « 0.7; 
absDelSnTAQFactor = 0.9; 
absDisSnTAQFactor =0.8; 
delSnTAQFactor = 0.85; 
CheckGnCompressedName = false; 
checkSnCompressedName = false; 
gnCompressedNameScore - 0.0; 
SnCompressedNameScore - 0.0; 
scoreGnTaqs * true; 
scoreSnTaqs - true; 

gnSegmentScoreMode = ;4H_SEGM0DE_AVG; 
snSegmentScoreMode = NH_SEGMODE_AVG; 
gnScoreThresh = 0.5;. 
snScoreThresh =0.5; 
gnWeight =0.8; 



snWeight = 1.0; 
breaks- 
case NH_PARMS_ARABIC: 

scoreThresh = 0.63; 
useGnLef tBias - false; 
useSnLef tBias = false; 
matchGnlntial = true; 
matchSnlntial - true; 
gnlnitialScore = 0.85; 
snInitialScore = 0.85; 
gnlnitiaiOnlnitialMatchScore = 1.0; 
snInitialOnlnitialMacchScore = 1.0; 
useGnVariants = false; 
useSnVariants = false; 
fnuScore = 0.60; 
nf nScore = 0 . 65; 
InuScore =0.6; 
nlnScore = 0 . 65; 

gnAnchorSegmentMode = NH_ANCHOR_SEG_NONE; 
snAnchorSegmentMode = NH_.ANCHOR_SEG_NONE; 
gnAnchor Factor = 0.0; 
snAnchor Factor = 0.0; 
gnOOPSFactor = 0.7; 
snOOPSFactor = 0.9; 
disGnTAQFactor ^ 0.7; 
absDelGnTAQFactor = 0.9; 
absDisGnTAQFactor = 0.8; 
delGnTAQFactor = 0.85; 
disSnTAQFactor =0.7; 
absDelSnTAQFactor =0.9; 
absDisSnTAQFactor = 0.8; 
delSnTAQFactor = 0.85; 
checkGnCompressedName = true; 
checkSnCompressedName = true; 
gnCompressedNameScore =0.9; 
snCompressedNameScore = 0.9; 
scoreGnTaqs « true; 
scoreSnTaqs = true; 

gnSegmentScoreMode = NH_SEGMODE_AVG; 

snSegmentScoreMode = NH_SEGMODE_AVG; 

gnScoreThresh = 0.63; 

snScoreThresh = 0.63; 

gnWeight =1.0; 

snWeight = 0,8; 

break; 

case NH_PARMS_CHINESE: 

scoreThresh = 0.70; 
useGnLef tBias = false; 
useSnLef tBias = false; 
matchGnlntial = false; 
matchSnlntial « false; 
gnlnitialScore - 0.0; 
snInitialScore = 0.0; 
gnlnitiaiOnlnitialMatchScore = 0.0; 
snlnitialOnlnitiai^MatchScore = 0.0; 
useGnVariants = true; 
useSnVariants = true; 
fnuScore = 0.60; 
nfnScore = 0 . 65; 



InuScore =0.6; 
nlnScore =0.65; 

gnAnchorSegmentMode = NH_ANCHOR_SEG_NONE; 
snAnchorSegmentMode = NH_ANCHOR_SEG_NONE; 
gnAnchorFactor = 0,0; 
snAnchorFactor =0.0; 
gnOOPSFactor= 0.0; 
■ snOOPSFactor = 1.0; 
disGnTAQFactor = 0.7; 
absDelGnTAQFactor = 0.9; 
absDisGnTAQFactor = 0.8; 
delGnTAQFactor = 0.85; 
disSnTAQFactor = 0.7; 
absDelSnTAQFactor = 0.9; 
absDisSnTAQFactor =0.8; 
delSnTAQFactor = 0.85; 
checkGnCompressedName - false; 
checkSnCompressedName - false; 
gnCompressedNameScore - 0.0; 
snCompressedNameScore - 0.0; 
scoreGnTaqs = true; 
scoreSnTaqs = true; 

gnSegmentScoreMode = NH_SEGMODE_LOWEST; 

snSegmentScoreMode = NH^SEGMODE_AVG; 

gnScoreThresh = 0.7; 

snScoreThresh = 0.7; 

gnWeight = 0.8; 

snWeight « 1.0; 

break; 

case NH_PARMS_H IS PANIC: 
scoreThresh = 0.60; 
useGnLef tBias = false; 
useSnLef tBias = false; 
matchGnlntial = true; 
matchSnlntial = true; 
gnlnitialScore = 0.85; 
snInitialScore = 0.85; 
gnlnitialOnlnitialMatchScore = 1.0; 
snInitialOnlnitialMatchScore = 1.0; 
useGnVariants = true; 
useSnVariants = true; 
fnuScore = 0.60; 
nfnScore =0.65; 
InuScore - 0.6; 
nlnScore = 0.65; 

gnAnchorSegmentMode = NH_ANCHOR_SEG_NONE; 

snAnchorSegmentMode = NH_ANCHOR_SEG__FIRST; 

gnAnchorFactor =0.0; 

snAnchorFactor = 0.70; 

gnOOPSFactor =0.6; 

snOOPSFactor = 0.6; 

disGnTAQFactor - 0.7; 

absDelGnTAQFactor * 0.9; 

absDisGnTAQFactor = 0.8; 

delGnTAQFactor = 0.85; 

disSnTAQFactor = 0.7;^ 

absDelSnTAQFactor = 0.9; 

absDisSnTAQFactor = 0.8; 

delSnTAQFactor = 0.85; 

CheckGnCompressedName .= true; 



checkSnCompressedName = true; 
gnCompressedNameScore = 0.9; 
snCompressedNameScore - 0.9; 
scoreGnTaqs = true; 
scoreSnTaqs = true; 

gnSegmentScoreMode = NH_SEGMODE_AVG; 

snSegmentScoreMode ^ NH_SEGMODE_AVG; 

gnScoreThresh = 0.6; 

snScoreThresh = 0.6; 

gnWeight = 0.8; 

snWeight = 1.0; 

break; 



case NH_PARMS_KOREAN : // Parameters 

tuned for Korean names. 

scoreThresh = 0.66; 
useGnLef tBias = false; 
useSnLef tBias = false; 
raatchGnlntial = false; 
matchSnlntial = false; 
gnlnitialScore = 0.0; 
snInitialScore = 0.0; 
gnlnitialOnlnitialMatchScore = 0.0; 
snlnitialOnlnitialMatchScore = 0.0; 
useGnVariants = true; 
useSnVariants = true; , 
fnuScore = 0 . 60; 
nfnScore = 0.65; 
InuScore = 0.6; 
nlnScore ~ 0.65; 

gnAnchorSegmentMode = MH_ANCHOR_SEG_NONE; 

snAnchorSegmentMode = NH_ANCHOR_SEG_NONE; 

gnAnchorFactor = 0.0; 

snAnchorFactor = 0.0; 

gnOOPSFactor = 0.69; 

snOOPSFactor = 0.63; 

disGnTAQFactor = 0.7; 

absDelGnTAQFactor 0.9; 

absDisGnTAQFactor =0.8; 

delGnTAQFactor =0.85; 

disSnTAQFactor = 0.7; 

absDelSnTAQFactor =0.9; - - 

absDisSnTAQFactor =0.8; 
delSnTAQFactor = 0.85; 
checkGnCompressedName = false; 
checkSnCompressedNarae = false; 
gnCompressedNameScore = 0.0; 
snCompressedNameScore - 0.0; 
scoreGnTaqs = true; 
scoreSnTaqs = true; 

gnSegmentScoreMode = NH_SEGMODE_AVG; 
snSegmentScoreMode = NH_SEGMODE_AVG; 
gnScoreThresh = 0.69; 
snScoreThresh =0.63; 
gnWeight = 0.8; 
snWeight - 1.0; 
break; f 

case NH_PARMS_RUSSIAN: // Parameters 

tuned for Russian names. 

scoreThresh = 0.61; 



aseGnLeftEias * false; 
useSnLefcBias = true- 
mat chGnXxixd a 1 -= true; 
matchSnlntiai = true; 
gnlnitialScore = 0.85; 
snInitialScore = 0.85; 
gnlnitialOnlnitialMatchScore' = 1.0; 
snInitialOnlnitialMatchScore = 1.0; 
useGnVariants = false; 
useSnVariants = false; 
fnuScore - 0.60; 
-►^i;..-.. nfnScore = 0.65; 

InuScore - 0.6; 
nlnScore = 0.65; 

gnAnchorSegmentMode = NH__ANCHOR_SEG_ FIRST; 

snAnchorSegmentMode = NH_ANCHOR_SEG_NONE; 

gnAnchorFactor = 0.60; 

snAnchorFactor = 0.00; 

gnOOPSFactor = 0.65; 

snOOPSFactor = 0.8; 

disGnTAQFactor = 0.7; 

absDelGnTAQFactor = 0.9; 

absDisGnTAQFactor = 0.8; 

delGnTAQFactor = 0.85; 

disSnTAQFactor = 0.7; 

absDelSnTAQFactor = 0.9; 

absDisSnTAQFactor - 0.8; 

delSnTAQFactor « 0.85; 

checkGnCompressedName = false; 

checkSnCompressedName = false; 

gnCompressedNameScore = 0.0; 

snCompressedNameScore = 0.0; 

gnSegmentScoreMode = NH_SEGMODE_HIGHEST; 

snSegmentScoreMode = NH_SEGMODE_AVG; 

gnScoreThresh = 0.6; 

snScoreThresh = 0.62; 

gnWeight * 0.8; 

snWeight = 1.0; 

break; 

} // end of switch 

) 



NHCompParras : :NHCompParms (istream iinStream) 
{ 

int compParmsVersion; 



if {inStream.good( ) ) { 
inStream.read( {char 



') & compParmsVersion, sizeof (int) )■; 



inStream. 
inStream, 
inStream. 
inStream. 
inStream. 
inStream. 
inStream. 
inStream. 
inStream. 
inStream. 
inStream. 



read ( 
read ( 
read( 
read ( 
read ( 
read ( 
read ( 
read( 
read ( 
read ( 
read( 



{char 
(char 
(char 
(char 
{char 
(char 
(char 
(char 
(char 
(char 
(char 



M &scoreThresh, si 
' } &useGnLef tBias, 
') &useSnLef tBias, 
" ) &matchGnIntial , 
^) imatchSnlntial, 
M fignlnitialScore, 
" ) & snInitialScore, 
" ) &useGnVariants, 
^ ) & useSnVariants, 
M&fnuScore, sizeo 
M&nfnScore, sizeo 



zeof (double) ) ; 
sizeof (bool) ) ; 
sizeof (bool) ) ; 
sizeof (bool) ) ; 
sizeof (bool) ) ; 
sizeof (double) ) 
sizeof (double ) ) 
sizeof (bool ) ) ; 
sizeof (bool ) ) ; 
f (double) ) ; 
f (double) ) ; 



inSt ream. read ( (char * 
inSt ream. read ( (char * 



&lnuScore, sizeof (double) ) ; 
&nlnScore, sizeof (double) ) ; 



&gnSegmentScoreMode, 
&snSegmentScoreMode, 
&gnAnchorSegmentMode , 
SsnAnchorSegmentMode, 



&gnAnchorFactor, sizeof (double) ) ; 
&snAnchorFactor, sizeof {double ) ) ; 
&gnOOPSFactor , sizeof (double) ) ; 
&snOOPSFactor, sizeof (double) ) ; 

&scoreGnTaqs, sizeof (bool )) ; 
&scoreSnTaqs, sizeof (bool) ) ; 



inStream. read ( (char * ! 
sizeof (NHSegScoreMode) ) ; 

inStream. read ( (char * ! 
sizeof (NHSegScoreMode) ) ; 

inStream. read ( (char *; 
sizeof (NHAnchorSegMode) ) ; 

inStream. read ( (char * ; 
sizeof (NHAnchorSegMode) ) ; 

inStream. read { (char 

inStream . read (( char * 

inStream. read ( (char * 

inStream . read ( ( char * 

inStream. read ( (char * 
inStream. read ( (char * 

inStream. read ( (char * 
inStream. read ( (char * 
inStream. read ( (char * 
inStream . read ( (char * 
inStream. read ( (char * 
inStream. read ( (char * 
inStream. read( (char * 
inStream . read ( ( char * 

inStream. read ( (char * 
inStream. read ( (char * 

inStream. read ( (char * 

sizeof (double) ) ; 

inStream. read ( (char * 
sizeof (double) ) ; 

inStream. read ( (char * 
inStream. read ( (char * 



SabsDelGnTAQFactor , sizeof (double ) ).; 
&absDisGnTAQFactor , sizeof (double) ) ; 
&absDelSnTAQFactor, sizeof (double) ) ; 
&absDisSnTAQFactor, sizeof (double) ) ; 
&delGnTAQFactor, sizeof (double) ) ; 
fifdelSnTAQFactor, sizeof (double) ) ; 
&disGnTAQFactor, sizeof (double) ) ; 
&disSnTAQFactor, sizeof (double) ) ; 

&checkGnCompressedName, sizeof (bool) ) ; 
ScheckSnCompressedName, sizeof (bool) ) ; 



&gnCompressedNameScore , 
&snCompressedNameScore, 

&gnScoreThresh, sizeof (double) ) ; 
ssnScoreThresh, sizeof (double) ) ; 



inStream. read { (char *)&gnWeight, sizeof (double) ) ; 
inStream. read ( (char *)&snWeight, sizeof (double) ) ; . 

inStream. read ( (char *) &gnInitialOnInitialMatchScore, 
sizeof (double) ) ; 

inStream. read ( (char * ) isnlnitialOnlnitialMatchScore, 
sizeof (double) ) ; 

status = NH_SUCCESS; 

} 

else 

status = NH COMP PARMS_BAD_STREAM ON CONSTRUCT; 



NHCompParms : : ~NHCompParms ( ) 

{ 

) 



NHReturnCode NHCompParms :: archiveData (ostream &outStream) 



{ 

// 
// 



comp parms file version history 

1.0 - first version 



int 

NHReturnCode 



comoParms Ve r s ion 



1; 



rc * NH SUCCESS; 



if {outStream. good ( ) ) { 

out St ream. write ( (char 



' ) &compParmsVersion, sizeof { int ) ) ; 



outStream. 
outStream. 
outStream. 

outStream, 
outStream. 
outStream. 
outStream. 
outStream. 
outStream. 
outStream. 
outStream. 
outStream. 
outStream, 



write ( 
write ( 
write ( 

write ( 
write ( 
write ( 
write ( 
write ( 
write ( 
write ( 
write { 
write ( 
write ( 



(char 
(char 
(char 
(char 
(char 
(char 
(char 
(char 
(char 
(char 
(char 
(char 
(char 



&scoreThresh, sizeof (double) ) ; 
&useGnLeftBias, sizeof (bool) ) ; 
&useSnLef tBias, sizeof (bool) ) ; 

SmatchGnlntial, sizeof (bool )) ; 
&matchSnIntial, sizeof (bool }) ; 
SgnlnitialScore, sizeof (double) ) ; 
&snInitialScore, sizeof {double ) ) ; 
iuseGnVariants, sizeof (bool )) ; 
&useSnVariants, sizeof (bool) ) ; 
&fnuScore, sizeof (double) ) ; 
&nfnScore, sizeof (double) ) ; 
SlnuScore, sizeof (double) ) ; 
snlnScore, sizeof (double )) ; 



outStream. write ( (char * ) &gnSegmentScoreMode, 
sizeof (NHSegScoreMode) ) ; 

outStream. write ( (char * ) SsnSegmentScoreMode,. 
sizeof (NHSegScoreMode) ) ; 

outStream. write ( (char * ) sgnAnchorSegmentMode, 
sizeof (NHAnchorSegMode) ) ; 

outStream. write ( (char * ) SsnAnchorSegmentMode, 
sizeo'f (NHAnchorSegMode) ) ; 

outStream. write ( (char * ) &gnAnchorFactor , si zeof (double )) ; 

outStream. write ( (char * ) SsnAnchorFactor ^ sizeof (double )) ; 

outStream. write ( (char * ) &gnOOPSFactor , sizeof (double) ) ; 

outStream. write ( (char * ) &snOOPSFactor, sizeof (double) ) ; 

outStream. write ( (char *) &scoreGnTaqs, sizeof (bool) ) ; 

outStream. write ( (char * ) &scoreSnTaqs, sizeof (bool) ) ; 



outStream. write ( (char * ) &absDelGnTAQFactor , 
outStream. write ( (char * ) &absDisGnTAQFactor , 
outStream. write ( (char * } SabsDelSnTAQFactor , 
outStream. write ( (char *) sabsDisSnTAQFactor , 
outStream. write ( (char *) fidelGnTAQFactor, si 
outStream. write ( (char * ) &delSnTAQFactor, si 
outStream. write ( (char * ) sdisGnTAQFactor, si 
outStream. write ( (char * ) fcdisSnTAQFactor, si 



s.izeof (double) ) , 
sizeof (double) ) , 
sizeof (double) ) , 
sizeof (double) ) , 

zeof (double) ) ; 

zeof (double) ) ; 

zeof (double) ) ; 

zeof (double) ) ; 



outStream. write ( (char *) &checkGnCompressedName, 
sizeof (bool) ) ; 

outStream. write ( (char * ) ficheckSnCompressedName, 
sizeof (bool) ) ; 

outStream. write ( (char * ) SgnCompressedNaraeScore, 
sizeof (doiible) ) ; 

outStream. write ( (char * ) &snCompressedNameScore, 
sizeof (double) ) ; , 



outStream. write ( (char * ) &gnScoreThresh, sizeof (double) ) ; 
outStream. write ( (char * ) &snScoreThresh, sizeof (double) ) ; 



outStream. write ( (char *)&gnWeight, sizeof (double) ) ; 
outStream. write ( (char * } &snVJeight , sizeof (double) ) ; 

outStream. write ( (char ♦) &gnInitialOnInitialMatchScore 
sizeof (double) ) ; 

outStream. write ( (char * ) &snInitialOnInitialMatchScore 
sizeof (double) ) ; 
1 

else 

rc = NH_COMP__PARMS__BAD_STREAM__ON_ARCHIVE; 
return rc; 

} 



NHReturnCode NHCompParms :-: setScoreThresh (double aThresh) 

{ 

NHReturnCode errorCode; 

if ((aThresh < 0.0) || (aThresh > 1.0)) 

errorCode = NH_INVALID_SCORE_THRESH ; 

else { 

scoreThresh = aThresh; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 

} 



void NHCompParms :: setUseGnLef tBias (bool aBool) 
useGnLef tBias ■= aBool; 



void NHCompParms :: setUseSnLef tBias (bool aBool) 
useSnLef tBias = aBool; 



void NHCompParms: :setMatchGnIntial (bool aBool) 
matchCnlntial = aBool; 



void NHCompParms :: setMatchSnlntial (bool aBool) 
matchSnlntial = aBool; 



NHReturnCode NHCompParms: isetGnlnitialScore (double aScore) 

{ 

NHReturnCode errorCode; 

if ((aScore < 0.0) || (aScore > 1.0)) 

errorCode - NH_INVALID_GN_INIT_SCORE; 
else { " 



gnlnitialScore = aScore; 
errorCode = NH_SUCCESS; 

1 

return errorCode; 

} 



NHReturnCode NHCorapParms :: setSnlnitialScore (double aScore) 

{ 

NHReturnCode errorCode; 

"'^If ({aScore < 0.0) II (aScore > 1.0)) 

errorCode = NH_INVALID_NH_INIT_SCORE; 
•else ( 

snInitialScore = aScore; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 

) 



NHReturnCode NHCompParms : : setGnlnitialOnlnitialMatchScore (double 

aScore) 

{ 

NHReturnCode errorCode; 

if {{aScore < 0.0) I I (aScore > 1.0)) 

errorCode = NH_INVALID_GN_INIT_ON_INIT_MATCH_SCORE; 
else' { 

gnlnitialOnlnitialMatchScore = aScore; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 

) 



NHReturnCode NHCompParms : : setSnlnitialOnlnitialMatchScore (double 

aScore) 

{ 

NHReturnCode errorCode; 

if {(aScore < 0.0) tl (aScore > 1,0)) 

errorCode = NH_INVALID_NH_INIT_ON_INIT_MATCH_SCORE; 
else { 

snInitialOnlnitialMatchScore = aScore; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 

) 



NHCompParms : : setUseGnVariants (bool aBool ) 
useGnVariants - aBool; ' 



void 
{ 

} 



void NHCompParms: :setUseSnVariants{bool aBool) 
useSnVariants = aBool; 



NHReturnCode NHCompParms : : setNFNScore (double aScore) 

^ NHReturnCode errorCode; 

if {(aScore < 0.0) II (aScore > 1.0)) 

errorCode - NH_INVALID_NFN__SCORE; 

else { 

nfnScore = aScore; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 



NHReturnCode NHCompParms: : setFNUScore (double aScore) 

^ NHReturnCode errorCode; 

if ({aScore < 0.0) II (aScore > 1.0)) 

errorCode = NH_INVALID_FNU_SCORE; 

else { 

fnuScore = aScore; 
errorCode = NH^SUCCESS; 

• ) 

return errorCode; 



NHReturnCode NHCompParms :: set NLNScore (double aScore) 

* NHReturnCode errorCode; 

if {(aScore < 0.0) II (aScore > 1.0)) 

errorCode = NH_INVALID_NLN_SCORE; 

else { 

nlnScore = aScore; 
errorCode « NH_SUCCESS; 

} 

return errorCode; 



NHReturnCode NHCompParms :: set LNUScore (double aScore) 

^ NHReturnCode errorCode; 

if ((aScore < 0.0) 11 (aScore > 1.0)) 

errorCode = NH__INVALID_LNU_SCORE; 

else { 

InuScore = aScore; / 
errorCode - NH_SUCCESS; 

} 



return errorCode; 

} 



NHReturnCode NHCompParms :: setGnScoreThresh (double aThresh) 

{ 

NHReturnCode errorCode; 

if ({aThresh < 0.0) I I (aThresh > 1.0)} 
errorCode - NH_INVALID_GN_THRESH; 
else ( 

-Ni*.^,, gnScoreThresh - aThresh; 

errorCode * NH_SUCCESS; 

} 

return errorCode; 

} 



NHReturnCode NHCompParms :: setSnScoreThresh (double aThresh) 

{ 

NHReturnCode errorCode; 

if ((aThresh < 0.0) || (aThresh > 1.0)) 

errorCode = NH_INVALID_NH_THRESH; 
else { 

snScoreThresh = aThresh; 

errorCode = NH_SOCCESS; 

} 

return errorCode; 

) 



NHReturnCode NHCompParms :; setGnWeight (double aWeight) 

{ 

NHReturnCode errorCode; 

if ((aWeight < 0.0) || (aWeight > 1.0)) 
errorCode «= NH_INVALID_GN_WEIGHT; 
else { 

gnWeight = aWeight; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 

) 



NHReturnCode NHCompParms :: setSnWeight (double aWeight) 

{ 

NHReturnCode errorCode; 

if {(aWeight < 0.0) II (aWeight > 1.0)) 
errorCode = NH_XNVALID_NH_WEIGHT; 
else ( 

snWeight « aWeight; / ' 

errorCode » NH_SUCCESS; 

) 



return errorCode; 

} 



void NHCompParms : : setGnSegme'ntScoreMode (NHSegScoreMode aMode) 
gnSegmentScoreMode = aMode; 



void NHCompParms: :setSnSegmentScoreMode {NHSegScoreMode aMode) 
"■^"snSegraentScoreMode = aMode; 



void NHCompParms: : setGnAnchorSegmentMode {NHAnchorSegMode anAnchorMode) 
gnAnchorSegmentMode = anAnchorMode; 



void NHCompParms: : setSnAnchorSegmentMode (NHAnchorSegMode anAnchorMode) 
snAnchorSegmentMode = anAnchorMode; 



NHReturnCode NHCompParms: : setGnAnchor Factor {double aFactor) 

{ 

NHReturnCode errorCode; 

if ((aFactor < 0.0) I I (aFactor > 1.0)) 

errorCode = NH_INVALID_GN_ANCHOR_FACTOR; 

else ( 

gnAnchorFactor = aFactor; 
errorCode = NH_SUCCESS; 

) 

return errorCode; 

} 

NHReturnCode NHCompParms : : setSnAnchorFactor (double "aFactor) 

{ 

NHReturnCode errorCode; 

if ((aFactor < 0.0) I! (aFactor > 1.0)) 

errorCode = NH_INVALIDJ>IH_ANCHOR_FACTOR; 

else {■ 

snAnchorFactor - aFactor; 
errorCode = NH_SUCCESS; 

) 

return errorCode; 

} 



NHReturnCode NHCompParms :: setGnOOPSFactor (double aFactor) 

t 

NHReturnCode errorCode; 



if ({aFactor < 0.0) ll (aFactor > 1.0)) 

errorCode = NH_INVALID_GN_OOPS_FACTOR; 

else { 

gnOOPSFactor = aFactor; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 

} 



NHReturnCode NHCompParms :: setSnOOPSFactor (double aFactor) 

{ 

NHReturnCode errorCode; 

if ((aFactor < 0.0) I I (aFactor > 1.0)) 

errorCode = NH_INVALID_NH_OOPS_FACTOR; 

else ( 

snOOPS Factor - aFactor; 
errorCode * NH_SUCCESS; 

} 

return errorCode; 

} 



NHReturnCode NHCompParms :: setAbsDelGnTAQFactor (double aFactor) 

{ 

NHReturnCode errorCode; 

if {(aFactor < 0.0) || (aFactor > 1.0)) 

errorCode = NH_INVALID_ABS_DEL_GN_TAQ_FACTOR; 
else { 

absDelGnTAQFactor = aFactor; 
errorCode = NH_SUCCESS; 

) 

return errorCode; 
} . ■ 



NHReturnCode NHCompParms :: setAbsDisGnTAQFactor (double aFactor-) 

I 

NHReturnCode errorCode; 

if ((aFactor < 0.0) I I (aFactor > 1.0)) 

errorCode = NH_INVALID_ABS_DIS__GN_TAQ_FACTOR; 
else ( 

absDisGnTAQFactor = aFactor; 
errorCode - NH_SUCCESS; 

} 

return errorCode; 

1 



NHReturnCode NHCompParms :: setAbsDelSnTAQFactor (double aFactor) 

( 

NHReturnCode errorCode; 



if ({aFactor < 0.0) II (aFaccor > 1.0)) 

errorCode = NH_INVALID_ABS_DEL_NH_TAQ_FACTOR; 
else { 

absDelSnTAQFactor = aFactor; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 

} 

NHReturnCode NHCompParms :: setAbsDisSnTAQFactor (double aFactor) 

{ 

NHReturnCode errorCode; 

if ((aFactor < 0.0) || (aFactor > 1.0)) 

errorCode = NH_INVALID_ABS_DIS__NH_TAQ_FACTOR; 

else { , ' 

absDisSnTAQFactor = aFactor; 
errorCode = NH^SUCCESS; 

} 

return errorCode; 

} 

NHReturnCode NHCompParms :: set DelGnTAQFactor (double aFactor) 

{ 

NHReturnCode errorCode; 

if ((aFactor < 0.0) II (aFactor > 1.0)) 

errorCode = NH_INVALID_DEL_GN_TAQ_FACTOR; 

else ( 

delGnTAQFactor = aFactor; 
errorCode « NH^SUCCESS; 

} 

return errorCode; 

} 

NHReturnCode NHCompParms :: setDelSnTAQFactor (double aFactor) . 

{ . . 

NHReturnCode errorCode; 

if {(aFactor < 0.0) II (aFactor > 1.0)) 

errorCode = NH_INVALID_DEL_NH_TAQ_FACTOR; 

else { 

delSnTAQFactor = aFactor; 
errorCode = NH_SUCCESS; 

) 

return errorCode; 

} 

NHReturnCode NHCompParms :: set DisGorTAQFact or (double aFactor) 

{ 

NHReturnCode errorCode; 

if ((aFactor < 0.0) II (aFactor > 1.0)) . 



errorCode ^ NH__INVALID_DIS_GN_TAQ_FACTOR; 
else { 

disGnTAQFactor - aFactor; 
errorCode = NH_SUCCESS; 

} 

. return errorCode; 

} 



NHReturnCode NHCompParms :: set DisSnTAQFactor (double aFactor) 

NHReturnCode errorCode; 

if ((aFactor < 0.0) It (aFactor > 1.0)) 

errorCode = NH_INVALID_DIS_NH_TAQ_FACTOR; 

else { 

disSnTAQFactor = aFactor; 
errorCode = NH_SUCCESS; 

} 

return errorCode; 

) 



void NHCompParms: : set ScoreGnTAQs (bool aBool) 
sCoreGnTaqs = aBool; 



void NHCompParms: :setScoreSnTAQs (bool aBool) 
scoreSnTaqs = aBool; 



void NHCompParms: :setCheckGnCompressedName (bool aBool) 

checkGnCompressedName « aBool; 



void NHCompParms: :setCheckSnCompressedName (bool iiBool) 

checkSnCompressedName = aBool; 



NHReturnCode NHCompParms : : setGnCompressedNameScore (double 

aScore) 

( 

NHReturnCode errorCode; 

if ((aScore < 0.0) II (aScore > 1.0)) 

errorCode = NH_INVALID_GN_COMPRESSED_NAME_SCORE; 
else { / 

gnCompressedNameScore = aScore; 

errorCode = NH_SUCCESS; 

} 



return errorCode; 

} 



NHReturnCode NHCompParms : : setSnCompressedNameScore (double 

aScore) 

{ 

NHReturnCode errorCode; 

if ( (aScore < 0.0) II (aScore > 1.0)) 

errorCode = NH_INVALID_NH_COMPRESSED_NAME_SCORE; 
else { 

-xif-A^. snCompressedNameScore = aScore; 

errorCode = NH_SUCCESS; 

) 

return errorCode; 

) 



bool NHCompParms :: operator== (NHCompParms sother) 

{ 

bool rc; 

rc = ( (scoreThresh =- other . scoreThresh) && 

(useGnLef tBias == other . useGnLeftBias ) && 
{useSnLeftBias == other . useSnLef tBias ) 
(matchGnlntial == other . matchGnlntial ) 
(matchSnlntial == other . matchSnlntial } && 
(gnlnitialScore == other . gnlnitialScore) && 
(snlnitialScore == other . snInitialScore ) && 
(useGnVariants == other . useGnVariants ) 
(useSnVariants == other .useSnVariants) && 
(fnuScore == other . fnuScore) && 
(nfnScore == other.nfnScore) && 
(InuScore == other . InuScore ) && 
(nlnScore == other . ninScore) && 

(gnSegmentSccreMode == other . gnSegmentScoreMode) 

&& 

{snSegmentScoreMode == other . snSegmentScoreMode) 

&& 

(gnAnchorSegmentMode == 
other .gnAnchorSegmentMode) && " ' • 

(snAnchorSegmentMode == 
other . snAnchorSegmentMode) && 

(gnAnchorFactor == other . gnAnchorFactor) && 
(sn/^chor Factor == other . snAnchorFactor) && 
. - (gnOOPSFactor == other . gnOOPSFactor) && 

(snOOPSFactor == other . snOOPSFactor) && 
(gnWeight ~ other . gnWeight ) 
{snWeight =■» other . snWeight) && 
{gnScoreThresh == other . gnScoreThresh) && 
(snScoreThresh == other . snScoreThresh) && 
(scoreGnTaqs other . scoreGnTaqs) && 
(scoreSnTaqs == other . scoreSnTaqs) 
(absDelGnTAQFactor == other . absDelGnTAQFactor ) 

&& 

(absDisGnTAQFacftor — other .absDisGnTAQFactor) 

&& 

(absDelSnTAQFactor == other . absDelSnTAQFactor) 

&& 



(absDisSnTAQFactor == other . absDisSnTAQFactor) 

&& 

(delGnTAQ Factor == other . delGnTAQFactor) 

(delSnTAQFactor == other . delSnTAQFactor ) 

(disGnTAQFactor == other . disGnTAQFactor) && 

(disSnTAQFactor == other . disSnTAQFactor) && 

(checkGnCompressedName == 
other. checkGnCompressedName) 

(checkSnCompressedName == 
other . checkSnCompressedName ) && 

(gnCompressedNameScore == 
oth^r^..gnCompressedNameScore) && 

(snCompressedNameScore == 
other . snCompressedNameScore) && 

(gnlnitialOnlnitialMatchScore =- 
other . gnlnitialOnlnitialMatchScore ) && 

(snInitialOnlnitialMatchScore =*= 
other. snlnitialOnlnitialMatchScore) ) ; 
return rc; 

} 



NHReturnCode NHCompParms : : getStatus { ) 

{ 

return status; 

} 



// File: NH_variant_taq_globals . h 
// 

// Description: 

// 

// Functions to manage the global variant and TAQ resources. 

// We manage the TAQ and variant tables as global resources 

// so that each SNCompParms object does not need to create its 

// own copy of them. We provide these global functions so that 

// we can control the variables in one location. 

// 

// 

// v^History: 
// ' 

// . 9/08/97 EFB Created 

// 3/20/98 EFB Changed names to NH from SN 

// 

# i f nde f NH_VARI ANT_TAQ_GLOBALS_DEFFED 

#define NH_VARIANT_TAQ_GLOBALS_DEFFED 

if include "NH_culture_codes . h" 

// function to return pointers to the global SN and GN Variant Tables 
NHVariantTable *NH_getVariantTable (NH_VARIANT__TABLE_TYPES 
variantTableType) ; 

NHTAQTable *NH_getTAQTable { ) ; 



#endif 



// 
// 
// 
// 
// 
// 
// 
// 
// 
// 
// 



File: 



NH_varianc_taq_globals . cpp 



Description : 



Functions to manage the global variant and TAQ resources. 
We manage the TAQ and variant tables as global resources 
so that each NHCompParms object does not need to create its 
own copy of them. We provide these global functions feo that 
we can control the variables in one location. 



We should provide some sort of thread protection around 



resources to make sure that two competing threads do not 



attempt 
// 

this 



to grab these resources during creation time. How can we do 



// 
// 
// 
// 
// 
// 
// 
// 



History: 



portably? . 



9/08/97 
3/20/98 



EFB 
EFB 



Created 

Changed names to NH from SN 



#include' <string.h> 



#include "NH_util . hpp" 
tinclude "NHVariantTable . hpp" 
#include "NHTAQTable . hpp" 

#include "NH_variant_taq_globals . h" 

// define SN and GN variant tables 
NHVariantTable *NH_snVariantTable = NULL; 
NHVariantTable *NH_gnVariantTable = NULL; 

// define a single TAQ table 
NHTAQTable *NH_taqTable = NULL; 



// functions to create and return pointers to the tables 

NHVariantTable * NH_get Variant Table {NH_VARIANT_TABLE_TY PES 

variantTableType} ~ 
{ 

NHVariantTable *tablePtr; 
NHVariantTable **tablePtrPtr = NULL; 

switch (variantTableType) ( 



case NH_S URN AME_VARI ANTS : 

tablePtr = NH^snVariantTable; 
tablePtrPtr *''&NH_snVariantTable; 
break; 

case NH_GIVENNAME_VARIANTS' 

tablePtr = NH_gnVariantTable; 
tablePtrPtr * &NH_gnVariantTable; 
break; 



default: 

tablePtr = NULL; 

} 

if (tablePtr == NULL) { 
tablePtr = new 

NHVariantTable (variantTableType) ; // create the table 

if (tablePtrPtr != NULL) 

*tablePtrPtr = tablePtr; // assign the global 

variable 
} 

return tablePtr; 



NHTAQTable *NH_getTAQTable ( ) 
{ 

if (NH_taqTable .== NULL)- { 
NH_taqTable ^ new 

NHTAQTable (NH^PRODUCT I ON_T AQ_TABLE ) ; • // create the table 

} 

return NH_taqTable; 

} 



// 
// 
// 
// 
// 

SNAPI 

// 

// 

// 

// 

// 

// 



File: NH_util.cpp 
Description : 

Implementation of various utility functions used in the 



History: 



5/15/97 EFB 
3/20/98 EFB 



Created 

Changed names to NH from SN 



tinclude <string.h> 



#incl-ude 
#include 



"NH_util.hpp" 
"NHCompParms.hpp" 



// function to remove leading and trailing spaces from a string 

// in place. 

// Strips the string at either end or both ends. 
// Stripchars specify the characters that should 
// be stripped. We start by seeing if they want the 
// trailing chars stripped, which is easy. We simply 
. // work backwards from the end of the string, looking for 
// the first non-strippable character, and terminate the 
// string just past that character. Then if they wanted 
// leading chars stripped, we work forwards to the first 
// non-strippable char, and then move that and each following 
// char to the beginning of the string, 
void NH_strip {char *aString) 

{ .. ■ ■ 

char *end_point; 
char *ch; 
int len; 

if ((len = strlen (aString) ) != 0) { // if there is a string- 
// start at end 
end_point = aString + len - 1; 

// and work back till we get a non-spaCe or get to 

// the begining of our string, chopping off what's left. 

// Also make sure we don't zoom right past the beginning of 

the 

// string. 

for (; strchr (NH_DEFAULT_WHITESPACE, *end_point) !- NULL && 
end_point != aString; end_point — } 

// if string was all whitespace 

if ((endjpoint aString) && ■strchr(NH DEFAULT_WHITESPACE, 
*aString) != NULL) 

*aString « EOS; // 'erase it all, and we're done, 

could return here 
else 

*(end_point + 1) « EOS; // just chop of f excess 



blanks 



// make sure there is still a string, since it might 
// have been stripped entirely above, 
if {*aString) { 

// now find first non space, we know string has at 

least one 

// nonwhite space, so we don't have to check for 

NULL. 

for (ch = aString; strchr (NH_DEFAULT_WHITESPACE, *ch) 

!= NULL; ch++) 

if (ch != aString) { // if there were leading spaces, 
move the block back 

char *target = aString; 
while {*ch != EOS) { 

*target = *ch; 

target++; 

ch++; 

} 

// and get the null char also 
♦target = *ch; 
} // end if (are there leading spaces?) 
} // end if (and text left?) 
} // end (is there a string at all ?) 

) 

char *• NH_strrchr (char *stringStart, char *searchPos, char 

searchChar) 

{ 

while (1) { . 

if (*searchPos ~ searchChar) 
break; 

if (searchPos == stringStart) { 

searchPos = NULL; // string not found, so 

return NULL 

brea k ; 

} 

searchPos — ; 

} 

return searchPos; 

} 



File: NH_queens_arra.ys.. hpp 
Description: 

Contains global definitions and declarations for the v 
combinations of indexes for the best score calculation 



// 
// 
// 
// 
// 
// 
// 
// 
// 

// History: 

// 6/4/97 EFB 

// 3/20/98 EFB 
// 

typedef unsigned char byte; 



Created 

Changed names to NH from SN 



byte twoByTwo[] = {1, 0, 
byte twoByThree[] - { 1# 2, 

1, 0, 

2, I, 
2, 0, 
0, 1, 

0, 2); 

byte twoByFour[] = { !# 2, 

1, 3, 
.1, 0, 

2, I, 
2, 3, 

2, 0, 

3, 1, 
3, 2, 
3, 0, 
0, 1, 
0, 2, 
0, 3}; 

byte twoByFived = { li 2, 



1, 4, 

1, 0, 

2, 1, 
2, 3, 
2, 4, 

2, -Oii^'-v. 

3, 1, 
3, 2, 
3, 4, 

3, 0, 

4, 1, 
4, 2, 
4, 3, 
4, 0, 
0, 1, 
0, 2, 
0, 3, 
0, 4}; 

byte threeByThree [] = { 1, 

1, 0, 2, 

2, 1, 0, 
2, 0, 1, 
0, 1, 2, 

0, 2, 1}; 

byte threeByFour [] = { 1, 2, 3, 

1, 2, 0, 
1, 3, 2, 
1, 3, 0, 
1, 0, 2, 
1, 0, 3, 



2, 1, 2, 

2, 1, 0, 

2, 2, 1, 

2, 3, 0, 

2, 0, 1, 

2, 0, 3, 
■"^^3, 1, 2, 

3, 1, 0, 
3, 2, 1, 
3, 2, 0, 
3, 0, 1, 
3, 0, 2, 
0, 1, 2, 
6, 1, 3, 
0, 2, 1, 
0, 2, 3, 
0, 3, 1, 

0, 3, 2); 

byte threeByFive[] = { 1, 2, 3, 

1, 2, A, 
1, 2, 0, 
1, 3, 2, 
1, 3, 4, 
1, 3, 0, 
1, 4, 2, 

' 1, 4, 3, 

1, 4, 0, 

1/ 0, 2, 

1, 0, 3, 

1, 0, 4, 

2, 1, 3, 



2, 1, 4, 
2, 1, 0, 
2, 3, 1, 
2, 3, 4, 
2, 3, 0, 
-^.2, 4, 1, 
2, 4, 3, 
2, 4, 0, 
.2, 0, 1, 
2, 0, 3, 

2, 0, 4, 

3, 1, 2, 
3, 1, 4, 
3, 1, 0, 
3, 2, I, 

2, 2, 4, 

3, 2, 0, 
3, 4, 1, 
3, 4, 2, 
3, 4, 0, 
3, 0, 1, 
3, 0, 2, 

3, 0, 4, 

4, 1, 2, 
4, 1, 3, 
4, 1, 0, 
4, 2, 1, 
4, 2, 3, 
4, 2, 0, 
4, 3, 1, 



4, 3, 2, 

4, 3, 0, 

4, 0, 1, 

4, 0, 2, 

4, 0, 3, 

0, 1, 2, 

'-'^lO, 1, 3, 

0, 1, 4, 

0, 2, 1, 

0, 2, 3, 

0, 2, 4, 

0, 3, 1, 

0, 3, 2, 

0, 3, 4, 

0, 4, 1, 

0, 4, 2, 

0, 4, 3}; 

byte f ourByFour [ ] - 

. 1, 2, 0, 3, 

1, 3, 0, 2, 
1, 3, 2, 0, 
1, 0, 2, 3, 

1, 0, 3, 2, 

2, 1, 3, 0, 
2, 1, 0, 3, 
2, 3, 1, 0, 
2, 3, 0, 1, 
2, 0, 1, 3, 

2, 0, 3, 1, 

3, 1, 2, 0, 
3, 1, 0, 2, 



K 1, 2, 3, 0, 



. 3, 2, 

3, 2, 

3, 0, 

3, 0, 

0, 1, 

-*<ir-.vO, 1, 

0, 2, 

0, 2, 

0, 3, 

0, 3, 
byte 

1, 2, 
1, 2, 
1, 2, 
1, 2, 
1, 2, 
1, 3, 
1, 3, 
1, 3, 
1, 3, 
1, 3, 
1, 3, 
1, 4, 
1, 4, 
1/ 4, 
1, 4, 
1, 4, 

Ir 4, 

1, 0, 

1, 0, 



1/ 0, 

0, 1, 

1, 2, 

2, 1, 

2, 3, 

3, 2, 

1, 3, 

3, 1, 

\, 2, ■ 

2, 1}; 

fourByFive[] = { 1, 2, 3, 4, 

3, 0, 

4, 3, 

4, 0, 
0, 3, 
0, 4, ■ 
2, 4, 
2, 0, 
4, 2, 
4, 0, 
0, 2, 
0, 4, 
2, 3, 

2, 0, 

3, 2, 
3, 0, 
0, 2, 
0, 3, 
2, 3, 
2, 4, 



1, 


0, 


3/ 


2, 


1, 


0, 


3, 


4, 


Ir 


0, 


4, 


2, 


1, 


0, 


4, 


3, 


2, 


1, 


3, 


4, 


2, 


1, 


3, 


0, 


2, 


1, 


4, 


3, 


2, 


1, 


4, 


0, 


2, 


1, 


0, 


3, 


2, 


1, 


0, 


4, 


2, 


3, 


1, 


4, 


2, 


3, 


1, 


0, 


2, 


3, 


4, 


1, 


2, 


3, 


4, 


0, 


2, 


3, 


0, 


1, 


2, 


3, 


0, 


4, 


2, 


4, 


1, 


3, 


2 / 


4 , 


1 / 


0 / 


2, 


4, 


3, 


1, 


2, 


4, 


3, 


0, 


2, 


4, 


0, 


1, 


2, 


4, 


0, 


3, 


2, 


0, 


1. 


3, 


2, 


0, 


1, 


4, 


2, 


0, 


3, 


1, 


2, 


0, 


3, 


A, 


2, 


0, 


4, 


1, 


2, 


0, 


4, 


3, 


3, 


2, 


1, 


4, 


3, 


2, 


1, 


0, 


3, 


2, 


4, 


1, 



3, 2, 4, 0, 
3, 2, 0, 1, 
3, 2, 0, 4, 
3, 1, 2, 4, 
3, 1, 2, 0, 
-M/5.>, 1, 4, 2, 
3, 1, 4, 0, 
3, 1, 0, 2, 
3, 1, 0, 4, 
3, 4, 2, 1, 
3, 4, 2, 0, 
3, 4, 1, 2, 
3; 4, 1, 0, 
3, 4, 0, 1, 
3, 4,. 0, 1, 
3, 0, 2, 1, 
3, 0, 2, 4, 
3, 0, 1, 2, 
3, 0, 1, 4, 
3, 0, 4, 2, 

3, 0, 4, 1, 

4, 2, 3, 1, 
4^ 2, 3, 0, 
4, 2, 1, 3", 
4, 2, \, 0, 
4, 2, 0, 3, 
4, 2, 0, 1, 
4, 3, 2, 1, 
4, 3, 2, 0, 
4, 3, 1, 2, 



/* Generated by VariantManager */ 
addVariant("ANN'\"ANITA",0.85,"E "); 
addVariant("ANN","ANA",0.85,"E "); 

addVariant("ANN","ANNIE",0.90,"E "); 
addVariant("ANN";'ANNA",0:85,"E "); 

addVariant("ANN","ANNE",0.95,"E"); 
addVarianl("ANN"."ANNETTE",0.85,"E 



/* Generated by VariantManager */ 
addVariant("SON","SWUN",0.95,"C 
addVariant("SON","SHON",0.95,"K "); 
addVariant("SON","SOHN",0.95."K "); 
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